universal newlines and utf-16

Baz Walter bazwal at ftml.net
Sun Apr 11 10:12:14 EDT 2010


i am using python 2.6 on a linux box and i have some utf-16 encoded 
files with crlf line-endings which i would like to open with universal 
newlines.

so far, i have been unable to get this to work correctly.

for example:

 >>> open('test.txt', 'w').write(u'a\r\nb\r\n'.encode('utf-16'))
 >>> repr(open('test.txt', 'rbU').read().decode('utf-16'))
"u'a\\n\\nb\\n\\n'"
 >>> import codecs
 >>> repr(codecs.open('test.txt', 'rbU', 'utf-16').read())
"u'a\\n\\nb\\n\\n'"

of course, the output i want is:

"u'a\\nb\\n'"

i suppose it's not too surprising that the built-in open converts the 
line endings before decoding, but it surprised me that codecs.open does 
this as well.

is there a way to get universal newlines to work properly with utf-16 files?

(nb: i'm not interested in other methods of converting line endings - 
just whether universal newlines can be made to work correctly).



More information about the Python-list mailing list