Re: [Python-Dev] PEP 277 (unicode filenames): please review
Martin:
Indeed, that would be consistent. I deliberately want to leave this out of PEP 277. On Unix, things are not that clear - as Jack points out, readlink() and getcwd() also need consideration.
Linux and MacOSX use UTF-8 and should probably be treated as such, i.e. I want to open("äöü"), not open("äöü".encode("utf-8")). One interesting tidbit is that MacOSX requires Unicode filenames to be in NFD. I don't know whether anybody agreed on a standard normal form for Linux.
In this terrain, Windows has the cleaner API (they consider file names as character strings, not as byte strings), so doing the right thing is easier.
Byte strings are perfectly OK if they have a common encoding (meaning UTF-8, in some accepted normal form). Character strings are bad if their interpretation, or indeed their usability, changes with the presense of some random environment variable / registry entry / whatever. Under these constraints, calling it a character string vs. a byte string, and/or using it as such, is a matter of programmers' convenience. -- Matthias Urlichs
Matthias Urlichs
Linux and MacOSX use UTF-8 and should probably be treated as such, i.e. I want to open("äöü"), not open("äöü".encode("utf-8")).
What would be "äöü" in this context? Your message was encoded as Latin-1 - was that deliberate? You could expect that open(u"äöü") works well; for the way you write it, somebody needs to know what encoding the string has. Linux does *not* "use" UTF-8. On the file system API, it treats arbitrary byte sequences as-is, i.e. when you pass "äöü" as Latin-1, it will put those bytes on disk - if you later use "äöü" in UTF-8, Linux won't find the file. Instead, the convention seems to be that file names are in the locale's encoding - which might be UTF-8, if you use a UTF-8 locale.
Byte strings are perfectly OK if they have a common encoding (meaning UTF-8, in some accepted normal form).
Unfortunately, that precondition is false. There is no common encoding on Linux. Regards, Martin
participants (2)
-
martin@v.loewis.de
-
Matthias Urlichs