codecs.open() doesn't handle platform-specific line terminator

John Machin sjmachin at lexicon.net
Mon May 9 22:27:26 EDT 2011


According to the 3.2 docs
(http://docs.python.org/py3k/library/codecs.html#codecs.open),

"""Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of b'\n' is done on
reading and writing."""

The first point is that one would NOT expect "conversion of b'\n'" anyway.
One expects '\n' -> os.sep.encode(the_encoding) on writing and vice versa
on reading.

The second point is that there is no such restriction with the built-in
open(), which appears to work as expected, doing (e.g. Windows, UTF-16LE)
'\n' -> b'\r\x00\n\x00' when writing and vice versa on reading, and not
striking out when thrown curve balls like '\u0a0a'.

Why is codecs.open() different? What does "encodings using 8-bit values"
mean? What data loss?






More information about the Python-list mailing list