read a unicode file

Alan Kennedy alanmk at hotmail.com
Mon Jun 9 15:01:03 EDT 2003


Alan Kennedy wrote:

> Also, the standard iterator interface seems to be supported by the codecs.open
> method as well, e.g.
> 
> for line in codecs.open(filename, mode, encoding):
>     print line

But, of course, it isn't supported all the time :-)

If you do this with an utf-16 file, you get the exception

"NotImplementedError: .readline() is not implemented for UTF-16"

Which I found surprising.

I looked at some other encodings (it would be really nice to have a list of
supported encodings ;-) to see which others exhibit this behaviour. I wrote this
code to check

#---------------------------------

import codecs

s = "Hello world\n" * 5

encodings = [ 'iso-8859-1', 
    'ascii', 
    'utf-16', 
    'utf-8', 
    'mac-cyrillic', 
    'mbcs'
]

for e in encodings:
    f = codecs.open("%s.txt" % e, "wt", e)
    f.write(s)
    f.close()

for e in encodings:
    try:
        for line in codecs.open("%s.txt" % e, "rt", e):
            pass
        print "%s does support readline" % e
    except NotImplementedError:
        print "%s does not support readline" % e
    except IOError:
        print "%s gave an IOError. You're running python < 2.3 maybe?" % e

#---------------------------------

And I saw that, of the encodings I selected, only 'utf-16' didn't support
.readline(). (BTW, the reason I selected some of the values above, like
'mac-cyrillic', was because I was looking for a multiple-byte character set, to
see its behaviour. You'll have to excuse my ignorance of cyrillic character
sets.)

So, a question: Why would the 'utf-16' codec not support readline? Looking at
the 'Lib\encodings\utf-16.py' module gives no hints. Is there a problem with
knowing what constitutes a line ending in that encoding?

BTW, thanks to all the encoding people, for doing a great job. I especially like
the .encode() method on strings, because it makes constructs such as this easy
to express:

s = "Hello world"
local_s = s.encode(sys.getdefaultencoding())

regards,

-- 
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan:              http://xhaus.com/mailto/alan




More information about the Python-list mailing list