Python and UTF-8

Thu Jan 3 13:46:06 EST 2002

Martin von Loewis <loewis at informatik.hu-berlin.de> wrote in 
news:j4itajb9jx.fsf at informatik.hu-berlin.de:

> There is no such thing as a Unicode file. Files are byte-oriented on
> all systems I know. So when opening a file, you need to specify the
> encoding. You can use codecs.open to read from a file and get Unicode
> strings out of it.
> 

Hhmm, but how come that reading a text file with Python and displaying it 
in a Tkinter text widget (with a Unicode font) will show the text just 
fine -- regardless of the encoding used to save the file (Latin-1 or UTF-
8) and without specifying the encoding when opening it. Does Python guess 
itself? As I said in my earlier posting, I just don't get how it works...

> You sort plain (byte) strings according to locale with
> locale.strcoll. In theory, this function ought to work for Unicode
> strings, too; it is a bug that it currently doesn't.

Okay, this explains the trouble I got into, when trying to use 
locale.strcoll with Unicode strings...

Thanks, Martin!

Matthias