[Python-Dev] PEP 277 (unicode filenames): please review

01 Sep 2002 23:22:29 +0200

Matthias Urlichs <smurf@noris.de> writes:

> Linux and MacOSX use UTF-8 and should probably be treated as such,=20
> i.e. I want to open("=E4=F6=FC"), not open("=E4=F6=FC".encode("utf-8")).

What would be "=E4=F6=FC" in this context? Your message was encoded as
Latin-1 - was that deliberate?

You could expect that open(u"=E4=F6=FC") works well; for the way you write
it, somebody needs to know what encoding the string has.

Linux does *not* "use" UTF-8. On the file system API, it treats
arbitrary byte sequences as-is, i.e. when you pass "=E4=F6=FC" as Latin-1,
it will put those bytes on disk - if you later use "=E4=F6=FC" in UTF-8,
Linux won't find the file.

Instead, the convention seems to be that file names are in the
locale's encoding - which might be UTF-8, if you use a UTF-8 locale.

> Byte strings are perfectly OK if they have a common encoding (meaning=20
> UTF-8, in some accepted normal form).=20

Unfortunately, that precondition is false. There is no common encoding
on Linux.

Regards,
Martin