First Cut at CSV PEP
Dave Cole
djc at object-craft.com.au
Wed Jan 29 00:08:17 CET 2003
>>>>> "Cliff" == Cliff Wells <LogiplexSoftware at earthlink.net> writes:
Cliff> But then MS isn't the only potential target, just our initial
Cliff> (and primary) target. foobar87 may allow export of escaped
Cliff> newlines and put a extraneous space after every delimiter and
Cliff> we don't want someone to have to write another csv importer to
Cliff> deal with it.
I agree. Excel compatibility is very important, but it is not the
only format we should be supporting.
>> The universal readlines support in Python 2.3 may impact the use of
>> a file reader/writer when processing different text files, but
>> would returns or newlines within a field be impacted? Should the
>> PEP and API specify that the record delimiter can be either CR, LF,
>> or CR/LF, but use of those characters inside a field requires the
>> field to be quoted or an exception will be thrown?
Interesting point. I think that newlines inside records are going to
be the same as those separating records. Anything else would be very
bizarre.
Cliff> The idea of raising an exception brings up an interesting
Cliff> problem that I had to deal with in DSV. I've run across files
Cliff> that were missing fields and just had a callback so the
Cliff> programmer could decide how to deal with it. This can be the
Cliff> result of corrupted data, but it's also possible for an
Cliff> application to only export fields that actually contain data,
Cliff> for instance:
Cliff> 1,2,3,4,5
Cliff> 1,2,3
Cliff> 1,2,3,4
I think that this is something which should be layer above the CSV
parser. The technique for reading a CSV (from the PEP) looks like
this:
csvreader = csv.parser(file("some.csv"))
for row in csvreader:
process(row)
Then any constraints on the content and structure of the records sits
logically in the process() function.
Cliff> This could very well be a valid csv file. I'm not aware of any
Cliff> requirement that rows all be the same length. We'll need to
Cliff> have some fairly flexible error-handling to allow for this type
Cliff> of thing when required or raise an exception when it indicates
Cliff> corrupt/invalid data. In DSV I allowed custom error-handlers
Cliff> so the programmer could indicate whether to process the line as
Cliff> normal, discard it, etc.
I am convinced that this does not belong in the parser.
We can always keep going up in layers and build a csvutils module on
top of the parser.
- Dave
--
http://www.object-craft.com.au
More information about the Csv
mailing list