[Python-Dev] [python] Re: New lines, carriage returns, and Windows

Terry Reedy tjreedy at udel.edu
Sat Sep 29 20:30:59 CEST 2007

"Michael Foord" <fuzzyman at voidspace.org.uk> wrote in message 
news:46FE6F92.40601 at voidspace.org.uk...
| Guido van Rossum wrote:

[snip first part of nice summary of Python i/o model]

| > The other translation deals with line endings. Upon input, any of
| > \r\n, \r, or \n is translated to a single \n by default (this is nhe 
| > "universal newlines" algorithm from Python 2.x). This can be tweaked
| > or disabled. Upon output, \n is translated into a platform specific
| > string chosen from \r\n, \r, or \n. This can also be disabled or
| > overridden. Note that \r, when written, is never treated specially; if
| > you want special processing for \r on output, you can write your own
| > translation layer.

| So the question is, that when a string containing '\r\n' is written to a
| file in text mode on a Windows platform, should it be written with the
| encoded representation of '\r\n' or '\r\r\n'?

I think Guido pretty clearly said that on output, the default behavior is 
that \r is nothing special.  If you want a special case exception, write a 
special case translator. +1 from me.

To propose otherwise is to propose that the default semantic meaning of 
Python text objects depend on the platform that it might be 
output-translated for.  I believe the point of universal newline support 
was to get away from this.

| Purity would dictate the latter and practicality the former (IMO)...

I disagree.  Special case exceptions complicate both learnability and code 
readability and maintainability.  Simplicity is practicality.  The symmetry 
of 'platform-line-endings =input> \n =output> plaform-line-endings' is both 
pure and practical.

| However, that would mean that round tripping a string would change it
| ('\r\n' would be written as '\r\n' and then read as '\n')

Whereas \r\r\n would be read back as \r\n, which is what should happen. 
Round-trip-ability is practical to me.

| - on the other
| hand (particularly given that we are treating the data as text and not a
| binary blob) I don't see how writing '\r\r\n' would ever actually be
| useful in text.

There are two normal ways for internal Python text to have \r\n:
1. Read from a file with \r\r\n.  Then \r\r\n is correct output (on the 
same platform).
2. Intentially put there by a programmer.  If s/he also chooses default \n 
translation on output, \r<translation of \n> is correct.

The leaves
1. Bugs due to ignorance or accident.  These should be repaired.
2. Other special situations, which can be handled by disabling, overriding, 
and layering the defaults.  This seems enough flexibility to me.

Terry Jan Reedy

More information about the Python-Dev mailing list