[Python-ideas] csv.DictReader could handle headers more intelligently.

Wed Jan 23 18:08:32 CET 2013

Hi,

2013/1/23 J. Cliff Dyer <jcd at sdf.lonestar.org>

> On Tue, 2013-01-22 at 17:51 -0800, alex23 wrote:
> > I don't think we should start adding support for every malformed type
> > of csv file that exists. It's easy enough to remove the unnecessary
> > lines yourself before passing them to DictReader:
> >
> >     from csv import DictReader
> >
> >     with open('malformed.csv','rb') as csvfile:
> >         csvlines = list(l for l in csvfile if l.strip())
> >         csvreader = DictReader(csvlines)
> >
> > Personally, if I was dealing with this as often as you are, I'd
> > probably make a custom context manager instead. The problem lies in
> > the files themselves, not in csv's response to them.
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > http://mail.python.org/mailman/listinfo/python-ideas
> >
>
> With all due respect, while you make a good point that we don't want to
> start special casing every malformed type of CSV, there is absolutely
> something wrong with DictReader's response to files that have duplicate
> headers. It throws away data silently.
>

That's how Python dictionaries work, by design:
    d = {'a': 1, 'a': 2}
"silently" discards the first value.

If you (and others on this list) aren't in favor of trying to find the
> right header row (which I can understand: "In the face of ambiguity,
> refuse the temptation to guess."), maybe a better solution would be to
> raise a (suppressible) exception if the headers aren't uniquely named.
> ("Errors should never pass silently.  Unless explicitly silenced.")
>

What about a subclass then:

class CarefulDictReader(csv.DictReader):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        fieldnames = self.fieldnames
        if len(fieldnames) != len(set(fieldnames)):
            raise ValueError("Duplicate field names", fieldnames)

-- 
Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130123/616cefb7/attachment.html>