[Patches] Parsing strings with \r\n or \r

Tue, 30 May 2000 16:35:22 +0200

Guido van Rossum wrote:
> 
> [Tim]
> > > IMO there should be *no* #ifdefs in any of this logic:  regardless of
> > > platform, and regardless of program source (be it file, string, file-like
> > > object, ...), Python tokenizers should recognize all of \r\n, \n, and \r as
> > > terminating a line (with \r\n viewed as a single line terminator when it
> > > appears, rather than as \r first and then an empty line ending with \n).
> > > Note that Java compilers are required to do this for Java source code, and
> > > it's a workaround that actually works.  Somebody else will have to consider
> > > the 3,017 other ways Unicde spells end-of-line <0.9 wink>.
> >
> > I'm with little Timmy on this one. Just make all platforms deal with all
> > newline types.
> >
> > And then, yah: make that available to the string-based parsing, too.
> 
> +2 here.

FYI, strings and Unicode have a method called .splitlines()
which breaks lines at any combination of \r, \r\n or \n
(the Unicode allows some more line break chars).

>>> "abc\rdef\r\nghi\rzzz".splitlines()
['abc', 'def', 'ghi', 'zzz']
>>> "abc\rdef\r\nghi\rzzz".splitlines(1)
['abc\015', 'def\015\012', 'ghi\015', 'zzz']

and

>>> u"abc\rdef\r\nghi\rzzz".splitlines()
[u'abc', u'def', u'ghi', u'zzz']
>>> u"abc\rdef\r\nghi\rzzz".splitlines(1)
[u'abc\015', u'def\015\012', u'ghi\015', u'zzz']

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/