[Python-ideas] csv.DictReader could handle headers more intelligently.

Shane Green shane at umbrellacode.com
Sat Jan 26 12:55:48 CET 2013


Sorry if this is a dupe–it went to the google groups address the first time around, and I think that's different…


> I've been trying to avoid the wrath, but can't any longer.  Let me start but clarifying that I know what a dictionary is, how it works, and what Python is, so we can bypass calling that into question.  I also know what CSV is, and I've dealt with a lot of real-life examples of CSV data: not just exports from excel, log data from the energy management space, sensor values, etc.; critical electrical fault data generated by very legacy, stupid equipment.  And while it's true that a dictionary is a dictionary and it works the way it works, the real point that drives home is that it's an inappropriate mechanism for dealing ordered rows of sequential values.  Regardless of what choices were made for the implementation, if the module's name is csv, it should be able to do the things it says it does with any legal CSV content without losing information.  Just because its how a dictionary works doesn't mean column 3's value replacing column 1's value is something other than the loss of data.  One CSV file I worked with had headers for five columns of information, then the header "VALUE" for every 5 minute period in an hour.  Using this CSV parser would leave the client with one sample an hour: how dictionaries work isn't going to bring back 10 values, so information was lost.  
> 
> The final point is a simple one: while that CSV file format was stupid, it was perfectly legal.  Something that deals with CSV content should not be losing any of its content.  It also should [not] be barfing or throwing exceptions, by the way.  


> And what about fixing it by replacing implementing a class that does it correctly, maps values to column numbers, keeps values as lists modeled after FieldStorage.  Make iterating it work just like it does now by replacing the values with the last value in each least before returning it, and provide iterator methods for getting at the new functionality, which includes iterating items with repeating header names in order, etc; and also iter records, or something like that, to iterate the head: [value, …] maps?



Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 25, 2013, at 8:53 AM, Mark Hackett <mark.hackett at metoffice.gov.uk> wrote:

> On Friday 25 Jan 2013, Ethan Furman wrote:
>> On 01/25/2013 03:00 AM, Mark Hackett wrote:
>>> On Thursday 24 Jan 2013, Steven D'Aprano wrote:
>>>> - it is less obvious: how does the caller decide that there are too many
>>>>    field names?
>>> 
>>> Additionally, the user of the library now has to read much more about the
>>> library (either code or documentation, which has to track the code too),
>>> to decide what it is going to do.
>>> 
>>> If you have to read the code, then it's not really OO, is it. It's light
>>> grey, not black box.
>> 
>> If you have to read the code, the documentation needs improvement.
>> 
> 
> And if you put your feet too close to the fire, your feet will burn.
> 
> Neither have anything to do with the subject at hand, however.
> 
> Which is if a dictionary acts a certain way and calling a routine that creates 
> a dictionary AND WORKS DIFFERENTLY, then why did you use a routine that 
> creates a dictionary?
> 
> You see, the option here is to leave it operating as a dictionary operates. 
> And in that case, you do not need to document anything. The documentation of 
> how it works is already covered by the python basics: "How does a dictionary 
> work in Python?".
> 
> So don't change it, and you don't have to improve the documentation.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130126/5eefaa63/attachment.html>


More information about the Python-ideas mailing list