On Thu, Dec 2, 2010 at 1:55 PM, Antoine Pitrou <solipsis@pitrou.net> wrote: ..
I don't think so. Β str.split() and str.splitlines() are also defined in conformance to the SPEC, AFAIK. Β They certainly try to.
You are joking, right? Where exactly does Unicode specify something like this:
''.join('πππ'.split('\udf00\ud800')) 'ππ' ?
OK, splitting on a given separator has very little to do with Unicode or UCD, but str.splitlines() makes absolutely no attempt to conform to Unicode Standard Annex #14 ("Unicode line breaking algorithm"). Wait, UAX #14 is actually relevant to textwrap module which saw very little change since 2.x days. So, what exactly does str.splitlines() do? And which part of the Unicode standard defines how it is different from str.split(.., '\n')? Reference manual does not help me here either: """ str.splitlines([keepends]) Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true. """ http://docs.python.org/dev/library/stdtypes.html#str.splitlines