[Python-ideas] csv.DictReader could handle headers more intelligently.
Mark Hackett
mark.hackett at metoffice.gov.uk
Thu Jan 24 11:37:57 CET 2013
On Wednesday 23 Jan 2013, Jerry Hill wrote:
> On Wed, Jan 23, 2013 at 1:32 PM, Mark Hackett
>
> <mark.hackett at metoffice.gov.uk> wrote:
> > I can't see why there would be duplicate column headers for valid reason.
> >
> > Someone may have written their CSV export incorrectly, but that's not
> > actually valid.
>
> Sure it is. Since there is no formal spec for .csv files, having a
> multiple columns with the same text in the header is a perfectly valid
> .csv file. For what it's worth, the informal spec for csv files seems
Then you don't want it put in a dictionary, since a dictionary doesn't allow
duplicate fields.
> to be "whatever Excel does" and Excel (and every other
> spreadsheet-oriented program) is happy to let you have duplicated
> headers too.
You don't, in Excel, use the name of the column in your calculation, you use
the unique column ID (A, B, C..AA, AB, ...).
>
> > It would therefore be arguable for the program to give at least a WARNING
> > that it's throwing data away.
>
> I think the library should give the programmer some sort of indication
> that they are losing data. Personally, I'd prefer an exception which
> can either be caught or not, depending on whether the program is
> designed to handle the situation or not.
>
> > However, since python is mechanising this as a dictionary and since in
> > python setting A to 1 then setting A to 3 would throw away the earlier
> > value for A and the import function working AS EXPECTED in Python.
>
> I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
> It's not terribly surprising once you sit down and think about it, but
> it's certainly at least a little unexpected to me that data is being
> thrown away with no notice. It's unusual for errors to pass silently
> in python.
>
Python doesn't warn about duplicate addition to keys, so as expected, it isn't
warning about them now.
Programming languages are hard enough to understand (why does everyone use a
different way of stopping a loop???), so it's not a good idea to have little
codas to the way things are done "oh, unless you're putting it into a
dictionary via this call...".
I can understand the library call doing so, mind, but I can also see the
writer of the library going "You're putting it into a dictionary. Well, you
know what happens when you put duplicate entries in them, right, else you
wouldn't be using this routine that puts csv entries into a dictionary".
More information about the Python-ideas
mailing list