First Cut at CSV PEP

Dave Cole djc at object-craft.com.au
Wed Jan 29 00:08:17 CET 2003


>>>>> "Cliff" == Cliff Wells <LogiplexSoftware at earthlink.net> writes:

Cliff> But then MS isn't the only potential target, just our initial
Cliff> (and primary) target.  foobar87 may allow export of escaped
Cliff> newlines and put a extraneous space after every delimiter and
Cliff> we don't want someone to have to write another csv importer to
Cliff> deal with it.

I agree.  Excel compatibility is very important, but it is not the
only format we should be supporting.

>> The universal readlines support in Python 2.3 may impact the use of
>> a file reader/writer when processing different text files, but
>> would returns or newlines within a field be impacted? Should the
>> PEP and API specify that the record delimiter can be either CR, LF,
>> or CR/LF, but use of those characters inside a field requires the
>> field to be quoted or an exception will be thrown?

Interesting point.  I think that newlines inside records are going to
be the same as those separating records.  Anything else would be very
bizarre.

Cliff> The idea of raising an exception brings up an interesting
Cliff> problem that I had to deal with in DSV.  I've run across files
Cliff> that were missing fields and just had a callback so the
Cliff> programmer could decide how to deal with it.  This can be the
Cliff> result of corrupted data, but it's also possible for an
Cliff> application to only export fields that actually contain data,
Cliff> for instance:

Cliff> 1,2,3,4,5
Cliff> 1,2,3
Cliff> 1,2,3,4

I think that this is something which should be layer above the CSV
parser.  The technique for reading a CSV (from the PEP) looks like
this:

    csvreader = csv.parser(file("some.csv"))
    for row in csvreader:
        process(row)

Then any constraints on the content and structure of the records sits
logically in the process() function.

Cliff> This could very well be a valid csv file.  I'm not aware of any
Cliff> requirement that rows all be the same length.  We'll need to
Cliff> have some fairly flexible error-handling to allow for this type
Cliff> of thing when required or raise an exception when it indicates
Cliff> corrupt/invalid data.  In DSV I allowed custom error-handlers
Cliff> so the programmer could indicate whether to process the line as
Cliff> normal, discard it, etc.

I am convinced that this does not belong in the parser.

We can always keep going up in layers and build a csvutils module on
top of the parser.

- Dave

-- 
http://www.object-craft.com.au




More information about the Csv mailing list