[Tutor] unicode utf-16 and readlines [using the 'codecs' unicode file reading module]
Poor Yorick
gp@pooryorick.com
Tue Jan 7 19:39:05 2003
Danny Yoo wrote:
>
>
>
>I have to admit I'm a bit confused; there shouldn't be any automatic
>handling of newlines when we use read(), since read() sucks all the text
>out of a file.
>
>Can you explain more what you mean by automatic newline handling? Do you
>mean a conversion of '\r\n' to '\n'?
>
>
>
As you mentioned, strip works correctly with the list items returned by
codec.readlines(), so my problem is entirely resolved. Yes, I meant
that codecs.readlines returns '\r\n' where a standard file object
returns just '\n':
>>> import codecs
>>> fh = codecs.open('0022data2.txt', 'r', 'utf-16')
>>> a = fh.readlines()
>>> a
[u'\u51fa\r\n']
>>> fh = open('test1.txt', 'r')
>>> a = fh.readlines()
>>> a
['hello\n', 'goodbye\n', 'where\n', 'how\n', 'when']
>>>
Perhaps you could tell me if this inconsistency poses any implications
for the Python programmer.
Poor Yorick
gp@pooryorick.com