[Python-ideas] csv.DictReader could handle headers more intelligently.
Amaury Forgeot d'Arc
amauryfa at gmail.com
Wed Jan 23 18:08:32 CET 2013
Hi,
2013/1/23 J. Cliff Dyer <jcd at sdf.lonestar.org>
> On Tue, 2013-01-22 at 17:51 -0800, alex23 wrote:
> > I don't think we should start adding support for every malformed type
> > of csv file that exists. It's easy enough to remove the unnecessary
> > lines yourself before passing them to DictReader:
> >
> > from csv import DictReader
> >
> > with open('malformed.csv','rb') as csvfile:
> > csvlines = list(l for l in csvfile if l.strip())
> > csvreader = DictReader(csvlines)
> >
> > Personally, if I was dealing with this as often as you are, I'd
> > probably make a custom context manager instead. The problem lies in
> > the files themselves, not in csv's response to them.
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > http://mail.python.org/mailman/listinfo/python-ideas
> >
>
> With all due respect, while you make a good point that we don't want to
> start special casing every malformed type of CSV, there is absolutely
> something wrong with DictReader's response to files that have duplicate
> headers. It throws away data silently.
>
That's how Python dictionaries work, by design:
d = {'a': 1, 'a': 2}
"silently" discards the first value.
If you (and others on this list) aren't in favor of trying to find the
> right header row (which I can understand: "In the face of ambiguity,
> refuse the temptation to guess."), maybe a better solution would be to
> raise a (suppressible) exception if the headers aren't uniquely named.
> ("Errors should never pass silently. Unless explicitly silenced.")
>
What about a subclass then:
class CarefulDictReader(csv.DictReader):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
fieldnames = self.fieldnames
if len(fieldnames) != len(set(fieldnames)):
raise ValueError("Duplicate field names", fieldnames)
--
Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130123/616cefb7/attachment.html>
More information about the Python-ideas
mailing list