UTF16, BOM, and Windows Line endings

Neil Hodgson nyamatongwe+thunder at gmail.com
Mon Feb 6 17:19:30 EST 2006


Fuzzyman:

> How should I handle line-endings for UTF16 ? Is it possible that other
> programs (on windows) will have line endings as u'\r\n' ? 

    Yes, try Notepad and save as Unicode. For the text

Fuzzy
End of lines

 >>> contents = open("C:\\fuzzy.txt", "rb").read()
 >>> contents
'\xff\xfeF\x00u\x00z\x00z\x00y\x00\r\x00\n\x00E\x00n\x00d\x00 
\x00o\x00f\x00 \x00l\x00i\x00n\x00e\x00s\x00'
 >>>

    The '\r\x00\n\x00' is a u'\r\n'.

 > When saving
> files for that platform should I make the line endings u'\r\n' ? (This
> sequence obviously encodes to four bytes in UTF16). I would only do
> this to ensure compatibility with other programs the user may use to
> create the text files.

    Notepad will read u'\r\n'. It doesn't like '\n' or u'\n'. Some 
applications are OK with other line ends by '\r\n' and u'\r\n' are 
safest on Windows.

    Neil



More information about the Python-list mailing list