[docs] [issue22232] str.splitlines splitting on non-\r\n characters

Fri Oct 5 04:17:42 EDT 2018

Marc-Andre Lemburg <mal at egenix.com> added the comment:

I am -1 on changing the default behavior. The Unicode standard defines what a linebreak code point is (all code points with character properties Zl or bidirectional property B) and we adhere to that. This may confuse parsers coming from the ASCII world, but that's really a problem with those parsers assuming that .splitlines() only splits on ASCII line breaks, i.e. they are not written in a Unicode compatible way.

As mentioned in https://bugs.python.org/issue18291 we could add a parameter to .splitlines(), but this would render the method not much faster than re.split().

Using re.split() is not a work-around in his case, it's an explicit form  of defining the character you want to split lines on, if the standards defining your file format as only accepting ASCII line break characters.

Since there are many such file formats, perhaps adding a parameter asciionly=True/False would make sense. .splitlines() could then be made to only split on ASCII linebreak characters. This new parameter would then have to default to False to maintain compatibility with Unicode and all previous releases.

----------
nosy: +lemburg

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue22232>
_______________________________________