codecs.open on Win32 -- converting my newlines to CR+LF

Ryan McGuire usenet at enigmacurry.com
Wed Aug 26 22:52:19 EDT 2009


I've got a UTF-8 encoded text file from Linux with standard newlines
("\n").

I'm reading this file on Win32 with Python 2.6:

codecs.open("whatever.txt","r","utf-8").read()

Inexplicably, all the newlines ("\n") are replaced with CR+LF ("\r
\n") ... Why?

As a workaround I'm having to do this:

open("whatever.txt","r").read().decode("utf-8")

which appropriately does not alter my newlines.

What really gets me confused though is the Python docs for
codecs.open:

"Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-
bit values. This means that no automatic conversion of '\n' is done on
reading and writing."

The way I read that, codecs.open should not touch my newlines. What am
I doing wrong? Is this a bug in Python, or in the docs, or both?



More information about the Python-list mailing list