[Python-3000] Proposed new language for newline parameter to TextIOBase
Guido van Rossum
guido at python.org
Wed Aug 15 06:56:23 CEST 2007
I thought some more about the universal newlines situation, and I
think I can handle all the use cases with a single 'newline'
parameter. The use cases are:
(A) input use cases:
(1) newline=None: input with default universal newlines mode; lines
may end in \r, \n, or \r\n, and these are translated to \n.
(2) newline='': input with untranslated universal newlines mode; lines
may end in \r, \n, or \r\n, and these are returned untranslated.
(3) newline='\r', newline='\n', newline='\r\n': input lines must end
with the given character(s), and these are translated to \n.
(B) output use cases:
(1) newline=None: every \n written is translated to os.linesep.
(2) newline='': no translation takes place.
(3) newline='\r', newline='\n', newline='\r\n': every \n written is
translated to the value of newline.
Note that cases (2) are new, and case (3) changes from the current PEP
and/or from the current implementation (which seems to deviate from
the PEP).
Also note that it doesn't matter whether .readline(), .read() or
.read(N) is used. The PEP is currently unclear on this and the
implementation is wrong.
Proposed language for the PEP:
``.__init__(self, buffer, encoding=None, newline=None)``
``buffer`` is a reference to the ``BufferedIOBase`` object to
be wrapped with the ``TextIOWrapper``.
``encoding`` refers to an encoding to be used for translating
between the byte-representation and character-representation.
If it is ``None``, then the system's locale setting will be
used as the default.
``newline`` can be ``None``, ``''``, ``'\n'``, ``'\r'``, or
``'\r\n'``; all other values are illegal. It controls the
handling of line endings. It works as follows:
* On input, if ``newline`` is ``None``, universal newlines
mode is enabled. Lines in the input can end in ``'\n'``,
``'\r'``, or ``'\r\n'``, and these are translated into
``'\n'`` before being returned to the caller. If it is
``''``, universal newline mode is enabled, but line endings
are returned to the caller untranslated. If it has any of
the other legal values, input lines are only terminated by
the given string, and the line ending is returned to the
caller translated to ``'\n'``.
* On output, if ``newline`` is ``None``, any ``'\n'``
characters written are translated to the system default
line separator, ``os.linesep``. If ``newline`` is ``''``,
no translation takes place. If ``newline`` is any of the
other legal values, any ``'\n'`` characters written are
translated to the given string.
Further notes on the ``newline`` parameter:
* ``'\r'`` support is still needed for some OSX applications
that produce files using ``'\r'`` line endings; Excel (when
exporting to text) and Adobe Illustrator EPS files are the
most common examples.
* If translation is enabled, it happens regardless of which
method is called for reading or writing. For example,
{{{f.read()}}} will always produce the same result as
{{{''.join(f.readlines())}}}.
* If universal newlines without translation are requested on
input (i.e. ``newline=''``), if a system read operation
returns a buffer ending in ``'\r'``, another system read
operation is done to determine whether it is followed by
``'\n'`` or not. In universal newlines mode with
translation, the second system read operation may be
postponed until the next read request, and if the following
system read operation returns a buffer starting with
``'\n'``, that character is simply discarded.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-3000
mailing list