csv-0.4 (John Machin release) released

Dave Cole djc@object-craft.com.au
13 Jul 2001 00:10:42 +1000


The CSV module provides a fast CSV parser which can split and join CSV
records which have been produced by Microsoft products such as Access
and Excel.

For some reason on Python 2.0, it now outperforms string.split().  Of
course the CSV parser can handle much more complex records than
string.split()...

This is a bugfix release.

My thanks to Skip Montanaro for providing most of the following
example:

   CSV files can be syntactically more complex than simply inserting
   commas between fields. For example, if a field contains a comma, it
   must be quoted:
   
     1,2,3,"I think, therefore I am",5,6

   The fields returned by this example are: 

     ['1', '2', '3', 'I think, therefore I am', '5', '6']

   Since fields are quoted using quotation marks, you also need a way
   to escape them. In Microsoft created CSV files this is done by
   doubling them:
   
     1,2,3,"""I see,"" said the blind man","as he picked up his hammer and saw"

   Excel and Access quite reasonably allow you to place newlines in
   cell and column data. When this is exported as CSV data the output
   file contains fields with embedded newlines.
   
     1,2,3,"""I see,""
     said the blind man","as he picked up his
     hammer and saw"

   A single record is split over three lines with text fields
   containing embedded newlines. This is what happens when you pass
   that data line by line to the CSV parser.
   
     ferret:/home/djc% python
     Python 2.0 (#0, Apr 14 2001, 21:24:22) 
     [GCC 2.95.3 20010219 (prerelease)] on linux2
     Type "copyright", "credits" or "license" for more information.
     >>> import csv
     >>> p = csv.parser()
     >>> p.parse('1,2,3,"""I see,""')
     >>> p.parse('said the blind man","as he picked up his')
     >>> p.parse('hammer and saw"')
     ['1', '2', '3', '"I see,"\012said the blind man', 'as he picked up his\012hammer and saw']

   Note that the parser only returns a list of fields when the record
   is complete.

The changes in this release are:

1- Exception raising was leaking the error message.  Thanks to John
   Machin for fixing this.

2- When a parsing exception is raised during parse(), the parser will
   automatically call clear() discard accumulated fields and state the
   next time you call parse().

   The old behaviour can be restored either by passing zero as the
   auto_clear constructor keyword argument, or by setting the
   auto_clear parser attribute to zero.

   As well as raising an exception, a parsing error will also set the
   readonly parser attribute had_parse_error to 1.  This is reset next
   time you call parse() or clear().

   Thanks again to John Machin for suggesting this.

3- An obscure parsing bug has been fixed.

   The old behaviour:

      >>> p.parse('12,12,1",')
      ['12', '12', '1",']
      >>> 

   The new behaviour:

      >>> p.parse('12,12,1",')
      ['12', '12', '1"', '']
      >>> 

   I am still of two minds about whether I should raise an exception
   when I encounter text like that...

The module homepage:

        http://www.object-craft.com.au/projects/csv/

For people who do not have a C compiler on Windows I have put a Python
2.1 binary up here:

        http://www.object-craft.com.au/projects/csv/csv.pyd

- Dave

-- 
http://www.object-craft.com.au