[Python-Dev] PEP 277 (unicode filenames): please review
Martin v. Loewis
martin@v.loewis.de
01 Sep 2002 23:22:29 +0200
Matthias Urlichs <smurf@noris.de> writes:
> Linux and MacOSX use UTF-8 and should probably be treated as such,=20
> i.e. I want to open("=E4=F6=FC"), not open("=E4=F6=FC".encode("utf-8")).
What would be "=E4=F6=FC" in this context? Your message was encoded as
Latin-1 - was that deliberate?
You could expect that open(u"=E4=F6=FC") works well; for the way you write
it, somebody needs to know what encoding the string has.
Linux does *not* "use" UTF-8. On the file system API, it treats
arbitrary byte sequences as-is, i.e. when you pass "=E4=F6=FC" as Latin-1,
it will put those bytes on disk - if you later use "=E4=F6=FC" in UTF-8,
Linux won't find the file.
Instead, the convention seems to be that file names are in the
locale's encoding - which might be UTF-8, if you use a UTF-8 locale.
> Byte strings are perfectly OK if they have a common encoding (meaning=20
> UTF-8, in some accepted normal form).=20
Unfortunately, that precondition is false. There is no common encoding
on Linux.
Regards,
Martin