[Python-3000] Universal newlines support in Python 3.0

Guido van Rossum guido at python.org
Sat Aug 11 19:29:38 CEST 2007


On 8/11/07, Tony Lownds <tony at pagedna.com> wrote:
>
> On Aug 10, 2007, at 11:23 AM, Guido van Rossum wrote:
>
> > Python 3.0 currently has limited universal newlines support: by
> > default, \r\n is translated into \n for text files, but this can be
> > controlled by the newline= keyword parameter. For details on how, see
> > PEP 3116. The PEP prescribes that a lone \r must also be translated,
> > though this hasn't been implemented yet (any volunteers?).
> >
>
> I'm working on this, but now I'm not sure how the file is supposed to
> be read when
> the newline parameter is \r or \r\n. Here's the PEP language:
>
>    buffer is a reference to the BufferedIOBase object to be wrapped
> with the TextIOWrapper.
>    encoding refers to an encoding to be used for translating between
> the byte-representation
>    and character-representation. If it is None, then the system's
> locale setting will be used
>    as the default. newline can be None, '\n', '\r', or '\r\n' (all
> other values are illegal);
>    it indicates the translation for '\n' characters written. If None,
> a system-specific default
>    is chosen, i.e., '\r\n' on Windows and '\n' on Unix/Linux. Setting
> newline='\n' on input
>    means that no CRLF translation is done; lines ending in '\r\n'
> will be returned as '\r\n'.
>    ('\r' support is still needed for some OSX applications that
> produce files using '\r' line
>    endings; Excel (when exporting to text) and Adobe Illustrator EPS
> files are the most common examples.
>
> Is this ok: when newline='\r\n' or newline='\r' is passed, only that
> string is used to determine
> the end of lines. No translation to '\n' is done.

I *think* it would be more useful if it always returned lines ending
in \n (not \r\n or \r). Wouldn't it? Although this is not how it
currently behaves; when you set newline='\r\n', it returns the \r\n
unchanged, so it would make sense to do this too when newline='\r'.
Caveat user I guess.

> > However, the old universal newlines feature also set an attibute named
> > 'newlines' on the file object to a tuple of up to three elements
> > giving the actual line endings that were observed on the file so far
> > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not
> > implemented. I'm tempted to kill it. Does anyone have a use case for
> > this? Has anyone even ever used this?
> >
>
> This strikes me as a pragmatic feature, making it easy to read a file
> and write back the same line ending. I can include in patch.

OK, if you think you can, that's good. It's not always sufficient (not
if there was a mix of line endings) but it's a start.

> http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22
> +show:cz2Fhijwr3s:yutdXigOmYY:YDns9IyEkLQ&sa=N&cd=12&ct=rc&cs_p=http://f
> tp.gnome.org/pub/gnome/sources/meld/1.0/
> meld-1.0.0.tar.bz2&cs_f=meld-1.0.0/filediff.py#a0
>
> http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22
> +show:SLyZnjuFadw:kOTmKU8aU2I:VX_dFr3mrWw&sa=N&cd=37&ct=rc&cs_p=http://s
> vn.python.org/projects/ctypes/trunk&cs_f=ctypeslib/ctypeslib/
> dynamic_module.py#a0

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list