Trouble with unicode
charlie at begeistert.org
Tue May 15 15:02:46 CEST 2001
>First you should check which encoding your Unicode file uses
>(e.g. sometimes Unicode refers to UTF-16 or just UTF-16-LE). Then
>you should read the file using codecs.open():
Actually I now know that it is latin-1
># replace encoding with 'utf-16' or 'utf-16-le' or 'utf-16-be'
>f = codecs.open(filename, 'rb', encoding)
>contents = f.read()
This is exactly what I was looking for. The only thing is having to use the
codec to read the file. I had expected something like
f = open(filename, "r")
contents = f.read()
contents = codecs.decode(contents, encoding)
or should I expect to start opening files with "rb" and an argument in the
future? I like the way Python encourages a standard way of doing things.
>Now you can convert the Unicode object contents into a plain
>string using some other encoding, e.g. Latin-1, and then
>write it back to a text file:
would do, if that was all I was doing with it. But it works fine as it is.
Thanx a lot!!!
More information about the Python-list