[Python-Dev] str object going in Py3K

Guido van Rossum guido at python.org
Wed Feb 15 22:37:52 CET 2006


On 2/15/06, Bill Janssen <janssen at parc.com> wrote:
> Well, I probably am, but that's not the reason.  Reading has nothing
> to do with it.

Actually if you read binary data in text mode on Windows you also get
corrupt (and often truncated) data, unless you're lucky enough that
the binary data contains neither ^Z (EOF) nor CRLF.

> The default mode (text) corrupts data on write on a certain platform
> (Windows) by inserting extra bytes in the data stream.  This bug
> particularly exhibits itself when programs developed on Linux or Mac
> OS X are then run on a Windows platform.  I think it's a bug to
> default to a mode which modifies the data stream.  The default mode
> should be 'binary'; people interested in exploiting the obsolete
> Windows distinction between "text" and "binary" should have to use a
> mode switch (I suggest "t") to put a file stream in 'text' mode.

This might have been a possibility in Python 2.x where binary reads
return strings. In Python 3000 binary files will return bytes objects
while text files will return strings (which are decoded from unicode
using an encoding that's determined when the file is opened, taking
into account system and user settings as well as possible overrides
passed to open()). I expect that the APIs for reading and writing
binary data will be sufficiently different from that for
reading/writing text that even staunch Unix programmers won't make the
mistake of using the text API for creating binary files.

I realize that's not the answer you're looking for, but for backwards
compatibility we can't change the default on Windows in Python 2.x, so
the point is moot until 3.0 or until a new binary file API is added to
2.x.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list