[Python-3000] Proposed new language for newline parameter to TextIOBase

Guido van Rossum guido at python.org
Wed Aug 15 06:56:23 CEST 2007


I thought some more about the universal newlines situation, and I
think I can handle all the use cases with a single 'newline'
parameter. The use cases are:

(A) input use cases:

(1) newline=None: input with default universal newlines mode; lines
may end in \r, \n, or \r\n, and these are translated to \n.

(2) newline='': input with untranslated universal newlines mode; lines
may end in \r, \n, or \r\n, and these are returned untranslated.

(3) newline='\r', newline='\n', newline='\r\n': input lines must end
with the given character(s), and these are translated to \n.

(B) output use cases:

(1) newline=None: every \n written is translated to os.linesep.

(2) newline='': no translation takes place.

(3) newline='\r', newline='\n', newline='\r\n': every \n written is
translated to the value of newline.

Note that cases (2) are new, and case (3) changes from the current PEP
and/or from the current implementation (which seems to deviate from
the PEP).

Also note that it doesn't matter whether .readline(), .read() or
.read(N) is used. The PEP is currently unclear on this and the
implementation is wrong.

Proposed language for the PEP:


    ``.__init__(self, buffer, encoding=None, newline=None)``

        ``buffer`` is a reference to the ``BufferedIOBase`` object to
        be wrapped with the ``TextIOWrapper``.

        ``encoding`` refers to an encoding to be used for translating
        between the byte-representation and character-representation.
        If it is ``None``, then the system's locale setting will be
        used as the default.

        ``newline`` can be ``None``, ``''``, ``'\n'``, ``'\r'``, or
        ``'\r\n'``; all other values are illegal.  It controls the
        handling of line endings.  It works as follows:

        * On input, if ``newline`` is ``None``, universal newlines
          mode is enabled.  Lines in the input can end in ``'\n'``,
          ``'\r'``, or ``'\r\n'``, and these are translated into
          ``'\n'`` before being returned to the caller.  If it is
          ``''``, universal newline mode is enabled, but line endings
          are returned to the caller untranslated.  If it has any of
          the other legal values, input lines are only terminated by
          the given string, and the line ending is returned to the
          caller translated to ``'\n'``.

        * On output, if ``newline`` is ``None``, any ``'\n'``
          characters written are translated to the system default
          line separator, ``os.linesep``.  If ``newline`` is ``''``,
          no translation takes place.  If ``newline`` is any of the
          other legal values, any ``'\n'`` characters written are
          translated to the given string.

        Further notes on the ``newline`` parameter:

        * ``'\r'`` support is still needed for some OSX applications
          that produce files using ``'\r'`` line endings; Excel (when
          exporting to text) and Adobe Illustrator EPS files are the
          most common examples.

        * If translation is enabled, it happens regardless of which
          method is called for reading or writing.  For example,
          {{{f.read()}}} will always produce the same result as
          {{{''.join(f.readlines())}}}.

        * If universal newlines without translation are requested on
          input (i.e. ``newline=''``), if a system read operation
          returns a buffer ending in ``'\r'``, another system read
          operation is done to determine whether it is followed by
          ``'\n'`` or not.  In universal newlines mode with
          translation, the second system read operation may be
          postponed until the next read request, and if the following
          system read operation returns a buffer starting with
          ``'\n'``, that character is simply discarded.


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list