csv-0.4 (John Machin release) released
Dave Cole
djc at object-craft.com.au
Thu Jul 12 10:15:49 EDT 2001
The CSV module provides a fast CSV parser which can split and join CSV
records which have been produced by Microsoft products such as Access
and Excel.
For some reason on Python 2.0, it now outperforms string.split(). Of
course the CSV parser can handle much more complex records than
string.split()...
This is a bugfix release.
My thanks to Skip Montanaro for providing most of the following
example:
CSV files can be syntactically more complex than simply inserting
commas between fields. For example, if a field contains a comma, it
must be quoted:
1,2,3,"I think, therefore I am",5,6
The fields returned by this example are:
['1', '2', '3', 'I think, therefore I am', '5', '6']
Since fields are quoted using quotation marks, you also need a way
to escape them. In Microsoft created CSV files this is done by
doubling them:
1,2,3,"""I see,"" said the blind man","as he picked up his hammer and saw"
Excel and Access quite reasonably allow you to place newlines in
cell and column data. When this is exported as CSV data the output
file contains fields with embedded newlines.
1,2,3,"""I see,""
said the blind man","as he picked up his
hammer and saw"
A single record is split over three lines with text fields
containing embedded newlines. This is what happens when you pass
that data line by line to the CSV parser.
ferret:/home/djc% python
Python 2.0 (#0, Apr 14 2001, 21:24:22)
[GCC 2.95.3 20010219 (prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> import csv
>>> p = csv.parser()
>>> p.parse('1,2,3,"""I see,""')
>>> p.parse('said the blind man","as he picked up his')
>>> p.parse('hammer and saw"')
['1', '2', '3', '"I see,"\012said the blind man', 'as he picked up his\012hammer and saw']
Note that the parser only returns a list of fields when the record
is complete.
The changes in this release are:
1- Exception raising was leaking the error message. Thanks to John
Machin for fixing this.
2- When a parsing exception is raised during parse(), the parser will
automatically call clear() discard accumulated fields and state the
next time you call parse().
The old behaviour can be restored either by passing zero as the
auto_clear constructor keyword argument, or by setting the
auto_clear parser attribute to zero.
As well as raising an exception, a parsing error will also set the
readonly parser attribute had_parse_error to 1. This is reset next
time you call parse() or clear().
Thanks again to John Machin for suggesting this.
3- An obscure parsing bug has been fixed.
The old behaviour:
>>> p.parse('12,12,1",')
['12', '12', '1",']
>>>
The new behaviour:
>>> p.parse('12,12,1",')
['12', '12', '1"', '']
>>>
I am still of two minds about whether I should raise an exception
when I encounter text like that...
The module homepage:
http://www.object-craft.com.au/projects/csv/
For people who do not have a C compiler on Windows I have put a Python
2.1 binary up here:
http://www.object-craft.com.au/projects/csv/csv.pyd
- Dave
--
http://www.object-craft.com.au
More information about the Python-list
mailing list