read a unicode file
Alan Kennedy
alanmk at hotmail.com
Mon Jun 9 15:01:03 EDT 2003
Alan Kennedy wrote:
> Also, the standard iterator interface seems to be supported by the codecs.open
> method as well, e.g.
>
> for line in codecs.open(filename, mode, encoding):
> print line
But, of course, it isn't supported all the time :-)
If you do this with an utf-16 file, you get the exception
"NotImplementedError: .readline() is not implemented for UTF-16"
Which I found surprising.
I looked at some other encodings (it would be really nice to have a list of
supported encodings ;-) to see which others exhibit this behaviour. I wrote this
code to check
#---------------------------------
import codecs
s = "Hello world\n" * 5
encodings = [ 'iso-8859-1',
'ascii',
'utf-16',
'utf-8',
'mac-cyrillic',
'mbcs'
]
for e in encodings:
f = codecs.open("%s.txt" % e, "wt", e)
f.write(s)
f.close()
for e in encodings:
try:
for line in codecs.open("%s.txt" % e, "rt", e):
pass
print "%s does support readline" % e
except NotImplementedError:
print "%s does not support readline" % e
except IOError:
print "%s gave an IOError. You're running python < 2.3 maybe?" % e
#---------------------------------
And I saw that, of the encodings I selected, only 'utf-16' didn't support
.readline(). (BTW, the reason I selected some of the values above, like
'mac-cyrillic', was because I was looking for a multiple-byte character set, to
see its behaviour. You'll have to excuse my ignorance of cyrillic character
sets.)
So, a question: Why would the 'utf-16' codec not support readline? Looking at
the 'Lib\encodings\utf-16.py' module gives no hints. Is there a problem with
knowing what constitutes a line ending in that encoding?
BTW, thanks to all the encoding people, for doing a great job. I especially like
the .encode() method on strings, because it makes constructs such as this easy
to express:
s = "Hello world"
local_s = s.encode(sys.getdefaultencoding())
regards,
--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
More information about the Python-list
mailing list