[Python-3000] Proposed new language for newline parameter to TextIOBase

Guido van Rossum guido at python.org
Wed Aug 15 19:28:00 CEST 2007


On 8/14/07, Guido van Rossum <guido at python.org> wrote:
> I thought some more about the universal newlines situation, and I
> think I can handle all the use cases with a single 'newline'
> parameter. The use cases are:
>
> (A) input use cases:
>
> (1) newline=None: input with default universal newlines mode; lines
> may end in \r, \n, or \r\n, and these are translated to \n.
>
> (2) newline='': input with untranslated universal newlines mode; lines
> may end in \r, \n, or \r\n, and these are returned untranslated.
>
> (3) newline='\r', newline='\n', newline='\r\n': input lines must end
> with the given character(s), and these are translated to \n.
>
> (B) output use cases:
>
> (1) newline=None: every \n written is translated to os.linesep.
>
> (2) newline='': no translation takes place.
>
> (3) newline='\r', newline='\n', newline='\r\n': every \n written is
> translated to the value of newline.

I'm going to respond to several replies in one email. Warning:
bikeshedding ahead!

On 8/15/07, Brett Cannon <brett at python.org> wrote:
> I like the options, but I would swap the meaning of None and the empty
> string.  My reasoning for this is that for option 3 it says to me
> "here is a string representing EOL, and make it \n".  So I would think
> of the empty string as, "I don't know what EOL is, but I want it
> translated to \n".  Then None means, "I don't want any translation
> done" by the fact that the argument is not a string.  In other words,
> the existence of a string argument means you want EOL translated to
> \n, and the specific value of 'newline' specifying how to determine
> what EOL is.

I see it differently. None is the natural default, which is universal
newline modes with translation on input, and translation of \n to
os.linesep on output. On input, all the other forms mean "no
translation", and the value is the character string that ends a line
(leaving the door open for a future extension to arbitrary record
separators, either as an eventual standard feature, or as a compatible
user-defined variant). If it is empty, that is clearly an exception
(since io.py is not able to paranormally guess when a line ends
without searching for a character), so we give that the special
meaning "disable translation, but use the default line ending
separators".

On output, the situation isn't quite symmetrical, since the use cases
are different: the natural default is to translate \n to os.linesep,
and the most common other choices are probably to translate \n to a
specific line ending (this helps keep the line ending choice separate
from the code that produces the output). Again, translating \n to the
empty string makes no sense, so the empty string can be used for
another special case: and again, it is the "give the app the most
control" case.

Note that translation on input when a specific line ending is given
doesn't make much sense, and can even create ambiguities -- e.g. if
the line ending is \r\n, an input line of the form XXX\nYYY\r\n would
be translated to XXX\nYYY\n, and then one would wonder why it wasn't
split at the first \n. (If you want translation, you're apparently not
all that interested in the details, so the default is best for you.)
For output, it's different: *not* translating on output doesn't
require one to specify a line ending when opening the file.

Here are a few complete scenarios:

- Copy a file (perhaps changing the encoding) while keeping line
endings the same: specify newline="" on input and output.

- Copy a file translating line endings to the platform default:
specify newline=None on input and output.

- Copy a file translating line endings to a specific string: specify
newline=None on input and newline="<string>" on output.

- Read a Windows file the way it would be interpreted by certain tools
on Windows: set newline="\r\n" (this treats a lone \n or \r as a
regular character).

On 8/15/07, Christian Heimes <lists at cheimes.de> wrote:
> I like to propose some constants which should be used instead of the
> strings:
>
> MAC = '\r'
> UNIX = '\n'
> WINDOWS = '\r\n'
> UNIVERSAL = ''
> NOTRANSLATE = None
>
> I think that open(filename, newline=io.UNIVERSAL) or open(filename,
> newline=io.WINDOWS) is much more readable than open(filename,
> newline=''). Besides I always forget if Windows is '\r\n' or '\n\r'. *g*

I find named constants unpythonic; taken to the extreme you'd also
want to define names for modes like "r" and "w+b". I also think it's a
bad idea to use platform names -- lots of places besides Windows use
\r\n (e.g. most standard internet protocols), and most modern Mac
applications use \n, not \r.

On 8/15/07, Georg Brandl <g.brandl at gmx.net> wrote:
> I'd use None and "\r"/... as proposed, but "U" instead of the empty string
> for universal newline mode. "U" already has that established meaning, and
> you don't have to remember the difference between the two (false) values ""
> and None.

But it would close off the possible extension to other separators I
mentioned above.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list