python regex: misbehaviour with "\r" (0x0D) as Newline character in Unicode Mode

Fredrik Lundh fredrik at pythonware.com
Sun Jan 27 13:27:19 EST 2008


Arian Sanusi wrote:

 > concerning to unicode, "\n", "\r "and "\r\n" (0x000A, 0x000D and
0x000D+0x000A) should be threatened as newline character

the link says that your application should treat them line terminators, 
not that they should all be equal to a new line character.

to split on Unicode line endings, use the splitlines method.  for the 
specific characters you mention, you can also read the file in universal 
mode.

</F>




More information about the Python-list mailing list