UTF16, BOM, and Windows Line endings
Neil Hodgson
nyamatongwe+thunder at gmail.com
Mon Feb 6 19:46:47 EST 2006
Fuzzyman:
> Thanks - so I need to decode to unicode and *then* split on line
> endings. Problem is, that means I can't use Python to handle line
> endings where I don't know the encoding in advance.
>
> In another thread I've posted a small function that *guesses* line
> endings in use.
You can normalise line endings:
>>> x = "a\r\nb\rc\nd\n\re"
>>> y = x.replace("\r\n", "\n").replace("\r","\n")
>>> y
'a\nb\nc\nd\n\ne'
>>> print y
a
b
c
d
e
The empty line is because "\n\r" is 2 line ends.
Neil
More information about the Python-list
mailing list