First Cut at CSV PEP
Skip Montanaro
skip at pobox.com
Tue Jan 28 23:48:28 CET 2003
Cliff> The idea of raising an exception brings up an interesting problem
Cliff> that I had to deal with in DSV. I've run across files that were
Cliff> missing fields and just had a callback so the programmer could
Cliff> decide how to deal with it. This can be the result of corrupted
Cliff> data, but it's also possible for an application to only export
Cliff> fields that actually contain data, for instance:
Cliff> 1,2,3,4,5
Cliff> 1,2,3
Cliff> 1,2,3,4
Cliff> This could very well be a valid csv file. I'm not aware of any
Cliff> requirement that rows all be the same length.
In fact, I think Excel itself will generate such files. As I write this,
XEmacs on the Windows machine is displaying a CSV file I dumped in Excel
from an XLS file I got from someone (having nothing to do with the task at
hand). It has seven rows of actual data, then 147 rows of commas. The
comma-only rows have 13, 15 or 255 commas, nothing else. The header line of
the CSV file has 15 fields with data and is terminated by a comma (empty
16th field).
In short, I don't think it's an error for CSV files to have rows of
differing lengths. We just have to return what we are given and expect the
application is prepared to handle short rows. We could add more flags, but
I think we should pause before we get too carried away with the flags.
I've added another issue to the proto-PEP:
- How should rows of different lengths be handled? The options seem
to be::
* raise an exception when a row is encountered whose length differs
from the previous row
* silently return short rows
* allow the caller to specify the desired row length and what to do
when rows of a different length are encountered: ignore, truncate,
pad, raise exception, etc.
I don't think we have to address each and every issue before a first release
is made, BTW.
Skip
More information about the Csv
mailing list