[Python-Dev] New lines, carriage returns, and Windows

Sat Sep 29 20:48:20 CEST 2007

"Guido van Rossum" <guido at python.org> wrote:
> 
> Have you looked at Py3k at all, especially PEP 3116 (new I/O)?

No.

> Python *does* have its own I/O model. There are binary files and text
> files. For binary files, you write bytes and the semantic model is
> that of an array of bytes; byte indices are seek positions.

That is the same model as C and Unix.  It is text files that we are
discussing.

> For text files, the contents is considered to be Unicode, encoded as
> bytes in a binary file. So text file always has an underlying binary
> file. Two translations take place, both of which have defaults varying
> by platform. One translation is encoding Unicode text into bytes upon
> output, and decoding bytes to Unicode text upon input. This can use
> any encoding supported by the encodings package.

The character code isn't the issue here, and is almost completely
irrelevant.

> The other translation deals with line endings. Upon input, any of
> \r\n, \r, or \n is translated to a single \n by default (this is nhe
> "universal newlines" algorithm from Python 2.x). This can be tweaked
> or disabled. Upon output, \n is translated into a platform specific
> string chosen from \r\n, \r, or \n. This can also be disabled or
> overridden. Note that \r, when written, is never treated specially; if
> you want special processing for \r on output, you can write your own
> translation layer.

Grrk.  That's the problem.  You don't get back what you have written,
for a start, which isn't nice.  There are other issues, too.

> That's all. There is nothing unimplementable or confusing in these
> specifications.

Nothing unimplementable, I agree.  Nothing confusing?  Not in the
experience of the users I have dealt with.

> Python doesn't care about record I/O on legacy OSes; it does care
> about variability found in practice between popular OSes.

As a short-term solution, that is fine.  But I have seen the wheel
turn a couple of times in 40 years, and expect it to continue after
I am safely 6' under ....

> Note that \r, \n and friends in Python 3000 are either ASCII (in bytes
> literals) or Unicode (in text literals). Again, no support for legacy
> systems that don't use ASCII or a superset.

That's not a problem.  I don't see that changing in the forseeable
future.

> Legacy OSes are called that for a reason.

Well, I remember when the text I/O model that C, Unix and Python
use WAS a feature of legacy OSs :-)

Seriously.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679