[issue4847] csv fails when file is opened in binary mode

Mon Mar 9 12:32:15 CET 2009

John Machin <sjmachin at users.sourceforge.net> added the comment:

pitrou> Please look at the doc for open() and io.TextIOWrapper. The
`newline` parameter defaults to None, which means universal newlines
with newline translation. Setting to '' (yes, the empty string) enables
universal newlines but disables newline translation ...

I had already read it. I gave it a prize for "least intuitive arg in the
language". So you plan to use that, reading "lines" instead of blocks?
You'll still have to examine which CRs and LFs are embedded and which
are line terminators. You might just as well use f.read(BLOCKSZ) and
avoid having to insist that the user explicitly write ", newline=''".

pitrou> However, I think csv should accept files opened in binary mode
and be able to deal with line endings itself. How am I supposed to know
the encoding of a CSV file? Surely Excel uses a defined, default
encoding when exporting to CSV... that knowledge should be embedded in
the csv module.

Excel has no default, because the user has no option -- the defined
encoding is "cp" + str(codepage_number_derived_from_locale), e.g.
"cp1252". Likewise other software writing delimited data to text files
will use (one of) the local legacy encoding(s).

So: (i) mode='rb' and no encoding => caller gets bytes back and needs to
do own decoding or (ii) mode='rb' and an encoding [which looks rather
daft and is currently not possible] and the the caller gets str objects.
Both of these are ugly -- hence my preference for the mode="rt" variety
of solution. Do we really want the double hassle of both a str csv
implementation and a bytes csv implementation?

----------
message_count: 13.0 -> 14.0

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4847>
_______________________________________