[New-bugs-announce] [issue30034] csv reader chokes on bad quoting in large files

Keith Erskine report at bugs.python.org
Mon Apr 10 17:50:25 EDT 2017


New submission from Keith Erskine:

If a csv file has a quote character at the beginning of a field but no closing quote, the csv module will keep reading the file until the very end in an attempt to close out the field.  It's true this situation occurs only when the quoting in a csv file is incorrect, but it would be extremely helpful if the csv reader could be told to stop reading each row of fields when it encounters a newline character, even if it is within a quoted field at the time.  At the moment, with large files, the csv reader will typically error out in this situation once it reads the maximum size of a string.  Furthermore, this is not an easy situation to trap with custom code.

Here's an example of the what I'm talking about.  For a csv file with the following content:
a,b,c
d,"e,f
g,h,i

This code:

    import csv
    with open('file.txt') as f:
        reader = csv.reader(f)
        for row in reader:
            print(row)

returns:
['a', 'b', 'c']
['d', 'e,f\ng,h,i\n']

Note that the whole of the file after "e", including delimiters and newlines, has been added to the second field on the second line. This is correct csv behavior but is very unhelpful to me in this situation.

On the grounds that most csv files do not have multiline values within them, perhaps a new dialect attribute called "multiline" could be added to the csv module, that defaults to True for backwards compatibility.  It would indicate whether the csv file has any field values within it that span more than one line.  If multiline is False, then the "parse_process_char" function in "_csv" would always close out a row of fields when it encounters a newline character.  It might be best if this multiline attribute were taken into account only when "strict" is False.

Right now, I do get badly-formatted files like this, and I cannot ask the source for a new file.  I have to manually correct the file using a mixture of custom scripts and vi before the csv module will read it. It would be very helpful if csv would handle this directly.

----------
messages: 291453
nosy: keef604
priority: normal
severity: normal
status: open
title: csv reader chokes on bad quoting in large files
type: enhancement
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30034>
_______________________________________


More information about the New-bugs-announce mailing list