Unexpected behaviour of csv module
andrew-news at andros.org.uk
Mon Sep 25 01:40:40 CEST 2006
I have a bunch of csv files that have the following characteristics:
- field delimiter is a comma
- all fields quoted with double quotes
- lines terminated by a *space* followed by a newline
What surprised me was that the csv reader included the trailing space in
the final field value returned, even though it is outside of the quotes.
I've produced a test program (see below) that demonstrates this. There
is a workaround, which is to not pass the csv reader the file iterator,
but rather a generator that returns lines from the file with the
trailing space stripped.
Interestingly, the same behaviour is seen if there are spaces before the
field separator. They are also included in the preceding field value,
even if they are outside the quotations. My workaround wouldn't help here.
Anyway is this a bug or a feature? If it is a feature then I'm curious
as to why it is considered desirable behaviour.
filename = "test_data.csv"
# Generate a test file - note the spaces before the newlines
fout = open(filename, "wb")
fout.write('"d" ,"e","f" \n')
# Function to test a reader
for line in reader:
print ",".join(['"%s"' % field for field in line])
# Read the test file - and print the output
reader = csv.reader(open("test_data.csv", "rb"))
# Now the workaround: a generator to strip the strings before the reader
for line in input:
reader = csv.reader(stripped(open("test_data.csv", "rb")))
# Try using lineterminator instead - it doesn't work
reader = csv.reader(open("test_data.csv", "rb"), lineterminator=" \r\n")
More information about the Python-list