csv-0.3 released
Dave Cole
djc at object-craft.com.au
Mon Jun 18 20:10:20 EDT 2001
>>>>> "Skip" == Skip Montanaro <skip at pobox.com> writes:
Chris> This appears to mean that the CSV parser is slower that
Chris> string.split, and as far as I can tell, does pretty much the
Chris> same thing. What am I missing?
Skip> CSV files can be syntactically more complex than simply
Skip> inserting commas between fields. For example, if a field
Skip> contains a comma, it must be quoted:
Skip> 1,2,3,"I think, therefore I am",5,6
Skip> Also, you can quote fields (as above, but the quotes are not to
Skip> be kept in the parsed output. The above should yield
Skip> ['1', '2', '3', 'I think, therefore I am', '5', '6']
Skip> Since fields are quoted using quotation marks, you also need a
Skip> way to escape them. This is usually done by doubling them:
Skip> 1,2,3,"""I see,"" said the blind man","as he picked up his
Skip> hammer and saw"
Skip> There are probably more rules, but the comma and quoting rules
Skip> eliminate simple string.split as a possibility. I believe the
Skip> author was only using his simple example as a bit of input that
Skip> could be fed to both string.split and csv.parser.
A more accurate explanation is that the author was too lazy/busy to
produce a good a good example like this.
The other thing that it supports is multi-line records. When you
export data from Access and Excel you sometimes get files which
look like this:
1,2,3,"""I see,""
said the blind man","as he picked up his
hammer and saw"
That is a single record split over three lines with text fields
containing embedded newlines. This is what happens when you pass that
data line by line to the CSV parser.
ferret:/home/djc% python
Python 2.0 (#0, Apr 14 2001, 21:24:22)
[GCC 2.95.3 20010219 (prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> import csv
>>> p = csv.parser()
>>> p.parse('1,2,3,"""I see,""')
>>> p.parse('said the blind man","as he picked up his')
>>> p.parse('hammer and saw"')
['1', '2', '3', '"I see,"\012said the blind man', 'as he picked up his\012hammer and saw']
Note that the parser only returns a list of fields when the record is
complete.
- Dave
--
http://www.object-craft.com.au
More information about the Python-list
mailing list