Unexpected behaviour of csv module
sjmachin at lexicon.net
Mon Sep 25 15:57:32 CEST 2006
skip at pobox.com wrote:
> One could argue that your CSV file is broken.
His CSV file is mildly broken. The examples that I gave are even more
broken, and are typical of real world files created by clueless
developers from databases which contain quotes and commas in the data
(e.g. addresses). The brokenness is not the point at issue. The point
is that the csv module is weakly silent about the brokenness and in
some cases munges the data even further.
> Of course, since CSV is a
> poorly specified format, that's a pretty weak statement.
It would help if the csv module docs did specify what format it
expects/allows on reading, and what it does on writing. How to quote a
field properly isn't all that mindbogglingly difficult (leaving out
options like escapechar and more-than-minimal quoting):
qc = quotechar
if qc in field:
out = qc + field.replace(qc, qc+qc) + qc
elif delimiter in field or '\n' in field or '\r' in field:
out = qc + field + qc
out = field
Notice how if there are any special characters in the input, the output
has a quotechar at each end. If not, it's broken, and detectably
___^ unexpected quote inside unquoted field
_____^ after quote, expected quote, delimiter, or end-of-line
> I don't remember
> just what your original problem was, but it concerned itself with white
> space as I recall. Have you tried setting the skipinitialspace parameter in
> your call to create a reader object?
The problem has nothing to do with *initial* spaces; the OP's problem
cases involved *trailing* spaces. And that's only a subset of the real
problem: casual attitude towards fields that contain quotes but don't
start and/or end with quotes i.e. they have *not* been created by
applying the usual quoting algorithm to raw data.
More information about the Python-list