[Python-Dev] CSV, bytes and encodings
solipsis at pitrou.net
Wed Apr 1 12:07:15 CEST 2009
R. David Murray <rdmurray <at> bitdance.com> writes:
> Having read through the ticket, it seems that a CSV file must be (and
> 2.6 was) treated as a binary file, and part of the CSV module's job
> is to convert that binary data to and from strings.
IMO this interpretation is flawed.
In 2.6 there is no tangible difference between "binary" and "text" files, except
for newline handling. Also, as a matter of fact, if you want the 2.x CSV module
to read a file with Windows line endings, you have to open the file in "rU" mode
(that is, the closest we have to a moral equivalent of the 3.x text files).
Therefore, I don't think 2.x is of any guidance to us for what 3.x should do.
I see three possible practical cases that, ideally, the 3.x CSV module should be
able to handle:
1. be handed a binary file (yielding bytes) without an encoding: in this case,
the CSV module should return lists of bytes objects
2. be handed a text file (yielding str) without an encoding: in this case, the
CSV module should return lists of str objects
3. be handed a binary file (yielding bytes) with an encoding: in this case, the
CSV module should also return lists of str objects
I think 2 and 3 both /should/ be supported (for 3, it's probably enough to wrap
the binary file in a TextIOWrapper ;-)). 1 would be convenient too, but perhaps
more work than it deserves (since it means the CSV module must be able to deal
internally with two different datatypes: bytes and str).
> The documentation says "If csvfile is a file object, it must be opened
> with the ‘b’ flag on platforms where that makes a difference."
The documentation is, IMO, wrong even in 2.x. Just yesterday I had to open a CSV
file in 'rU' mode because it had Windows line endings and I'm under Linux....
More information about the Python-Dev