First Cut at CSV PEP

Tue Jan 28 23:48:28 CET 2003

    Cliff> The idea of raising an exception brings up an interesting problem
    Cliff> that I had to deal with in DSV.  I've run across files that were
    Cliff> missing fields and just had a callback so the programmer could
    Cliff> decide how to deal with it.  This can be the result of corrupted
    Cliff> data, but it's also possible for an application to only export
    Cliff> fields that actually contain data, for instance:

    Cliff> 1,2,3,4,5
    Cliff> 1,2,3
    Cliff> 1,2,3,4

    Cliff> This could very well be a valid csv file.  I'm not aware of any
    Cliff> requirement that rows all be the same length.  

In fact, I think Excel itself will generate such files.  As I write this,
XEmacs on the Windows machine is displaying a CSV file I dumped in Excel
from an XLS file I got from someone (having nothing to do with the task at
hand).  It has seven rows of actual data, then 147 rows of commas.  The
comma-only rows have 13, 15 or 255 commas, nothing else.  The header line of
the CSV file has 15 fields with data and is terminated by a comma (empty
16th field).

In short, I don't think it's an error for CSV files to have rows of
differing lengths.  We just have to return what we are given and expect the
application is prepared to handle short rows.  We could add more flags, but
I think we should pause before we get too carried away with the flags.

I've added another issue to the proto-PEP:

    - How should rows of different lengths be handled?  The options seem
      to be::

      * raise an exception when a row is encountered whose length differs
        from the previous row

      * silently return short rows

      * allow the caller to specify the desired row length and what to do
        when rows of a different length are encountered: ignore, truncate,
        pad, raise exception, etc.

I don't think we have to address each and every issue before a first release
is made, BTW.

Skip