csv.reader has trouble with comma inside quotes inside brackets
Terry Reedy
tjreedy at udel.edu
Tue Jun 9 17:27:13 EDT 2009
Bret wrote:
> i have a csv file like so:
> row1,field1,[field2][text in field2 "quote, quote"],field3,field
> row2,field1,[field2]text in field2 "quote, quote",field3,field
>
> using csv.reader to read the file, the first row is broken into two
> fields:
> [field2][text in field2 "quote
> and
> quote"
>
> while the second row is read correctly with:
> [field2]text in field2 "quote, quote"
> being one field.
>
> any ideas how to make csv.reader work correctly for the first case?
> the problem is the comma inside the quote inside the brackets, ie:
> [","]
When posting, give version, minimum code that has problem, and actual
output. Cut and past latter two. Reports are less credible otherwise.
Using 3.1rc1
txt = [
'''row1,field1,[field2][text in field2 "quote, quote"],field3,field''',
'''row2,field1,[field2] text in field2 "quote, quote", field3,field''',
'''row2,field1, field2 text in field2 "quote, quote", field3,field''',
]
import csv
for row in csv.reader(txt): print(len(row),row)
produces
6 ['row1', 'field1', '[field2][text in field2 "quote', ' quote"]',
field3', 'field']
6 ['row2', 'field1', '[field2] text in field2 "quote', ' quote"', '
field3', 'field']
6 ['row2', 'field1', ' field2 text in field2 "quote', ' quote"', '
field3', 'field']
In 3.1 at least, the presence or absence of brackets is irrelevant, as I
expected it to be. For double quotes to protect the comma delimiter,
the *entire field* must be quoted, not just part of it.
If you want to escape the delimiter without quoting entire fields, use
an escape char and change the dialect. For example
txt = [
'''row1,field1,[field2][text in field2 "quote`, quote"],field3,field''',
'''row2,field1,[field2] text in field2 "quote`, quote", field3,field''',
'''row2,field1, field2 text in field2 "quote`, quote", field3,field''',
]
import csv
for row in csv.reader(txt, quoting=csv.QUOTE_NONE, escapechar = '`'):
print(len(row),row)
produces what you desire
5 ['row1', 'field1', '[field2][text in field2 "quote, quote"]',
'field3', 'field']
5 ['row2', 'field1', '[field2] text in field2 "quote, quote"', '
field3', 'field']
5 ['row2', 'field1', ' field2 text in field2 "quote, quote"', '
field3', 'field']
Terry Jan Reedy
More information about the Python-list
mailing list