[issue4847] csv fails when file is opened in binary mode

John Machin report at bugs.python.org
Tue Feb 24 08:25:43 CET 2009


John Machin <sjmachin at users.sourceforge.net> added the comment:

Sorry, folks, we've got an understanding problem here. CSV files are
typically NOT created by text editors. They are created e.g. by "save as
csv" from a spreadsheet program, or as an output option by some database
query program. They can have just about any character in a field,
including \r and \n. Fields containing those characters should be quoted
(just like a comma) by the csv file producer. A csv reader should be
capable of reproducing the original field division. Here for example is
a dump of a little file I just created using Excel 2003:

C:\devel\csv>\python26\python -c "print repr(open('book1.csv','rb').read())"
'Field1,"Field 2 has a\nvery long\nheading",Field3\r\n1.11,2.22,3.33\r\n'

Inserting \n into a text field in Excel (using Alt-Enter) is a
well-known user trick.

Here's what we get from Python 2.6.1:
C:\devel\csv>\python26\python -c "import csv; print
repr(list(csv.reader(open('book1.csv','rb'))))"
[['Field1', 'Field 2 has a\nvery long\nheading', 'Field3'], ['1.11',
'2.22', '3.33']]
and the same by design all the way back to Python 2.3's csv module and
its ancestor, the ObjectCraft csv module.

However with Python 3.0.1 we get:
C:\devel\csv>\python30\python -c "import csv;
print(repr(list(csv.reader(open('book1.csv','rb')))))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
_csv.Error: iterator should return strings, not bytes (did you open the
file in text mode?)

This sentence in the documentation is NOT an error: """If csvfile is a
file object, it must be opened with the ‘b’ flag on platforms where that
makes a difference."""

The problem *IS* a "biggie".

This paragraph in the documentation (evidently introduced in 2.5) is
rather confusing:"""The parser is quite strict with respect to
multi-line quoted fields. Previously, if a line ended within a quoted
field without a terminating newline character, a newline would be
inserted into the returned field. This behavior caused problems when
reading files which contained carriage return characters within fields.
The behavior was changed to return the field without inserting newlines.
As a consequence, if newlines embedded within fields are important, the
input should be split into lines in a manner which preserves the newline
characters.""" Some examples of what it is talking about would be a very
good idea.

----------
nosy: +sjmachin

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4847>
_______________________________________


More information about the Python-bugs-list mailing list