Reading Unicode Files

Erno Kuusela erno-news at erno.iki.fi
Wed Dec 27 18:21:20 EST 2000


"Gerson" == Gerson Kurz <gerson.kurz at t-online.de> writes:

| The Windows 2000 registry editor creates .REG files in UNICODE by
| default. Is there an easy way to use something like readlines() for
| it, that is, a list of u-strings ?

you can use the unicode() built-in function to conver the data
into a unicode string, if you know what encoding it is in.
something like
>>> unicode(open('f.reg').read(), 'utf-8')
should work (as long as you get the codec right).

or, you can make a file object workalike that gives you unicode
data. something like

>>> import codecs
>>> encode, decode, reader, writer = codecs.lookup('utf-8')
>>> f = reader(open('f.reg'))
>>> lines = f.readlines()

except with my copy of python, f.readlines() doesn't seem
to work quite like i'd expect it to. but readline() seems to work.

  -- erno



More information about the Python-list mailing list