[Tutor] unicode utf-16 and readlines

Poor Yorick gp@pooryorick.com
Sat Jan 4 09:22:01 2003


On Windows 2000, Python 2.2.1 open.readlines seems to read lines 
incorrectly when the file is encoded utf-16.  For example:

 >>> fh = open('0022data2.txt')
 >>> a = fh.readlines()
 >>> print a
['\xff\xfe\xfaQ\r\x00\n', '\x00']

In this example, Python seems to have incorrectly parsed the \n\r 
characters at the end of the line.  It's an error that one can work 
around by slicing off the last three characters of every other list 
element, but it makes working with utf-16 files non-intuitive, 
especially for beginners.  Or am I missing something?

Poor Yorick
gp@pooryorick.com