[Csv] bugs in parsing csv?
sjmachin at lexicon.net
sjmachin at lexicon.net
Sat Jan 22 00:06:57 CET 2005
I came across this example in the online version of "Programming in Lua" by Roberto
Ieru.+y:
>>> weird = '"hello "" hello", "",""\r\n'
This is not IMHO a correctly formed CSV string. It would not be produced by csv.writer.
However csv.reader accepts it without complaint:
>>> import csv
>>> rdr = csv.reader([weird])
>>> weird2 = rdr.next()
>>> weird2
['hello " hello', ' ""', '']
>>> wtr = csv.writer(file('weird2.csv', 'wb'))
>>> wtr.writerow(weird2)
>>> del wtr
>>> file('weird2.csv', 'rb').read()
'"hello "" hello"," """"",\r\n'
# correctly quoted.
Here are some more examples:
>>> csv.reader([' "\r\n']).next()
[' "']
>>> csv.reader([' ""\r\n']).next()
[' ""']
>>> csv.reader(['x ""\r\n']).next()
['x ""']
>>> csv.reader(['x "\r\n']).next()
['x "']
Looks like we don't give a damn if the field doesn't start with a quote. In the real world
this result might be OK for a field like 'Pat O"Brien' but it does indicate that the data
source is probably _NOT_ quoting at all.
However a not-infrequent mistake made by people generating what they call csv files is
to wrap quotes around some/all fields without doubling any pre-existing quotes:
>>> csv.reader(['"Pat O"Brien"\r\n']).next()
['Pat OBrien"'] <<<<<<<<<<<============== aarrbejaysus!!!
Further examples of where the data source needs head alignment and csv.reader
doesn't complain, giving an unfortunate result:
>>> csv.reader(['spot",the",mistake"\r\n']).next()
['spot"', 'the"', 'mistake"']
>>> csv.reader(['"attempt", "at", "pretty", "formatting"\r\n']).next()
['attempt', ' "at"', ' "pretty"', ' "formatting"']
More information about the Csv
mailing list