UTF16, BOM, and Windows Line endings
Fuzzyman
fuzzyman at gmail.com
Tue Feb 7 04:23:29 EST 2006
Neil Hodgson wrote:
> Fuzzyman:
>
> > Thanks - so I need to decode to unicode and *then* split on line
> > endings. Problem is, that means I can't use Python to handle line
> > endings where I don't know the encoding in advance.
> >
> > In another thread I've posted a small function that *guesses* line
> > endings in use.
>
> You can normalise line endings:
>
> >>> x = "a\r\nb\rc\nd\n\re"
> >>> y = x.replace("\r\n", "\n").replace("\r","\n")
> >>> y
> 'a\nb\nc\nd\n\ne'
> >>> print y
> a
> b
> c
> d
>
> e
>
> The empty line is because "\n\r" is 2 line ends.
>
Thanks - that works, but replaces *all* instances of '\r' to '\n' -
even if they aren't used as line terminators. (Unlikely perhaps). It
also doesn't tell me what line ending was used.
Apparently files opened in universal mode - 'rU' - have a newline
attribute. That makes it a bit easier. :-)
Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
> Neil
More information about the Python-list
mailing list