csv-0.3 released

Dave Cole djc at object-craft.com.au
Mon Jun 18 20:10:20 EDT 2001


>>>>> "Skip" == Skip Montanaro <skip at pobox.com> writes:

Chris> This appears to mean that the CSV parser is slower that
Chris> string.split, and as far as I can tell, does pretty much the
Chris> same thing. What am I missing?

Skip> CSV files can be syntactically more complex than simply
Skip> inserting commas between fields.  For example, if a field
Skip> contains a comma, it must be quoted:

Skip>     1,2,3,"I think, therefore I am",5,6

Skip> Also, you can quote fields (as above, but the quotes are not to
Skip> be kept in the parsed output.  The above should yield

Skip>     ['1', '2', '3', 'I think, therefore I am', '5', '6']

Skip> Since fields are quoted using quotation marks, you also need a
Skip> way to escape them.  This is usually done by doubling them:

Skip>     1,2,3,"""I see,"" said the blind man","as he picked up his
Skip> hammer and saw"

Skip> There are probably more rules, but the comma and quoting rules
Skip> eliminate simple string.split as a possibility.  I believe the
Skip> author was only using his simple example as a bit of input that
Skip> could be fed to both string.split and csv.parser.

A more accurate explanation is that the author was too lazy/busy to
produce a good a good example like this.

The other thing that it supports is multi-line records.  When you
export data from Access and Excel you sometimes get files which
look like this:

1,2,3,"""I see,""
said the blind man","as he picked up his
hammer and saw"

That is a single record split over three lines with text fields
containing embedded newlines.  This is what happens when you pass that
data line by line to the CSV parser.

ferret:/home/djc% python
Python 2.0 (#0, Apr 14 2001, 21:24:22) 
[GCC 2.95.3 20010219 (prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> import csv
>>> p = csv.parser()
>>> p.parse('1,2,3,"""I see,""')
>>> p.parse('said the blind man","as he picked up his')
>>> p.parse('hammer and saw"')
['1', '2', '3', '"I see,"\012said the blind man', 'as he picked up his\012hammer and saw']

Note that the parser only returns a list of fields when the record is
complete.

- Dave

-- 
http://www.object-craft.com.au



More information about the Python-list mailing list