PEP 305 - CSV File API

Dave Cole djc at object-craft.com.au
Sun Feb 2 10:25:38 CET 2003


>>>>> "Roman" == Roman Suzi <rnd at onego.ru> writes:

Roman> On Fri, 31 Jan 2003, Skip Montanaro wrote:
>> * What about conversion to other file formats? Is the list-of-lists
>>   output from the csvreader sufficient to feed into other writers?

Roman> It could also be interesting to be able to use attribute access
Roman> to fields:

Roman> for row in cvsreader:
Roman>     ... row.field1 ...
Roman>     ... row.field2 ...

Roman> and dictionary is sometimes needed, like in this example:

Roman> for row in cvsreader:
Roman>     print "%(field1)s - %(field2)s" % row.field1

Roman> But these examples assume I need to provide names to fields.

We have planned to include a higher level module called csvutils.py
which would be the logical place to have things like this.

Hmmm...  I wonder how much of a performance hit there is in doing
something like this:

>>> csvreader = csv.reader(file("some.csv"))
>>> fieldnames = csvreader.next()
>>> for values in csvreader:
...     row = dict(zip(fieldnames, values))
...     process(row)

By coding the dictionary row in the extension module you would not
have the overhead of creating the values tuple or the zip() list and
associated tuples.

Roman> (Those examples are from my log parsing code. Logs are kind of
Roman> CSV too ;)

Roman> That is, there is a need for an optional mechanism to name
Roman> fields on reading. However, attribute access is not suitable in
Roman> case names are arbitrary, so row["field1"] notation is the only
Roman> one left.

How kosher would it be to do something like this?

>>> class MyRow:
...     def __str__(self):
...         # something
... 
>>> csvreader = csv.reader(file("some.csv"), rowclass=MyRow)
>>> for row in csvreader:
...     print row

The rowclass argument would tell the reader that you wished the rows
to come out as instances of the specified class.  Maybe the rowclass
argument could just be a callable which would allow you to do this:

>>> class MyRow:
...     def __init__(self, arg):
...         self.arg = arg
...     def process(self):
...         pass
...
>>> f = lambda: MyRow(42)
>>> csvreader = csv.reader(file("some.csv"), rowclass=f)
>>> for row in csvreader:
...     row.process()

The idea would be to have the csv parser call the rowclass argument to
create some kind of object then simply place the fields in the object
with the names defined in the first record of the file.

This would probably make the reader pretty fast.

- Dave

-- 
http://www.object-craft.com.au




More information about the Python-list mailing list