Python and UTF-8
mhuening at zedat.fu-berlin.de
Thu Jan 3 19:46:06 CET 2002
Martin von Loewis <loewis at informatik.hu-berlin.de> wrote in
news:j4itajb9jx.fsf at informatik.hu-berlin.de:
> There is no such thing as a Unicode file. Files are byte-oriented on
> all systems I know. So when opening a file, you need to specify the
> encoding. You can use codecs.open to read from a file and get Unicode
> strings out of it.
Hhmm, but how come that reading a text file with Python and displaying it
in a Tkinter text widget (with a Unicode font) will show the text just
fine -- regardless of the encoding used to save the file (Latin-1 or UTF-
8) and without specifying the encoding when opening it. Does Python guess
itself? As I said in my earlier posting, I just don't get how it works...
> You sort plain (byte) strings according to locale with
> locale.strcoll. In theory, this function ought to work for Unicode
> strings, too; it is a bug that it currently doesn't.
Okay, this explains the trouble I got into, when trying to use
locale.strcoll with Unicode strings...
More information about the Python-list