[Python-Dev] New lines, carriage returns, and Windows
Guido van Rossum
guido at python.org
Sat Sep 29 17:07:18 CEST 2007
On 9/29/07, Nick Maclaren <nmm1 at cus.cam.ac.uk> wrote:
> "Paul Moore" <p.f.moore at gmail.com> wrote:
> >
> > OK, so far so good - although I'm not *quite* sure there's a
> > self-consistent definition of "code that only uses \n". I'll assume
> > you mean code that has a concept of lines, that lines never contain
> > anything other than text (specifically, neither \r or \n can appear in
> > a line, I'll punt on whether other weird stuff like form feed are
> > legal), and that whenever your code needs to write data to a file, it
> > writes lines with \n alone between them.
>
> I won't. There are a few of us still left who know how this started,
> and here is a simplified description.
>
> Unix was a computer scientist's workbench, and made no attempt to be
> general. In particular, its text datastream model was appropriate
> for the imnportant devices of the day - teletypes and similar. So
> far, so good. But what was forgotten later is that the model does
> NOT extend to other systems and, in particular, made no sense on the
> record-oriented models generally used by mainframes (see Fortran for
> an example).
>
> When C was standardised, this was fudged. I tried to get it improved,
> but it is one of the many things I failed to do. The handling of
> ALL of the control characters in text I/O is non-portable (even \t,
> despite what the satndard says), and you have to follow the system's
> constraints if things are to work. Unfortunately, the kludging that
> the compiler does to map C to the operating system confuses things
> still further - though it is essential.
>
> Now, BCPL was an ancestor of C, but always was a more portable
> language (i.e. it didn't start with a specific operating system in
> mind), and used/uses a rather better model. In this, line separators
> are atomic - e.g. '\f' is newline-with-form-feed and '\r' is
> "newline-with-overprinting". Now, THAT model is more generic.
> Not fully generic, of course, but it would cater for all of Unix,
> CPM and its derivatives (yes, Microsoft), MacOS and most mainframes
> (with some reservations).
>
> So, until and unless Python chooses to define its own I/O model,
> these problems will continue to arise. Whether this one is a simple
> bug or an avoidable feature, I can't say without looking harder,
> but bugs are often caused by attempting to implement impossible
> or confusing specifications.
Have you looked at Py3k at all, especially PEP 3116 (new I/O)?
Python *does* have its own I/O model. There are binary files and text
files. For binary files, you write bytes and the semantic model is
that of an array of bytes; byte indices are seek positions.
For text files, the contents is considered to be Unicode, encoded as
bytes in a binary file. So text file always has an underlying binary
file. Two translations take place, both of which have defaults varying
by platform. One translation is encoding Unicode text into bytes upon
output, and decoding bytes to Unicode text upon input. This can use
any encoding supported by the encodings package.
The other translation deals with line endings. Upon input, any of
\r\n, \r, or \n is translated to a single \n by default (this is nhe
"universal newlines" algorithm from Python 2.x). This can be tweaked
or disabled. Upon output, \n is translated into a platform specific
string chosen from \r\n, \r, or \n. This can also be disabled or
overridden. Note that \r, when written, is never treated specially; if
you want special processing for \r on output, you can write your own
translation layer.
That's all. There is nothing unimplementable or confusing in these
specifications.
Python doesn't care about record I/O on legacy OSes; it does care
about variability found in practice between popular OSes.
Note that \r, \n and friends in Python 3000 are either ASCII (in bytes
literals) or Unicode (in text literals). Again, no support for legacy
systems that don't use ASCII or a superset.
Legacy OSes are called that for a reason.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list