[Python-Dev] PEP 277 (unicode filenames): please review

Matthias Urlichs smurf@noris.de
Mon, 19 Aug 2002 15:44:08 +0200


Martin:
>  Indeed, that would be consistent. I deliberately want to leave this
>  out of PEP 277. On Unix, things are not that clear - as Jack points
>  out, readlink() and getcwd() also need consideration.
>
Linux and MacOSX use UTF-8 and should probably be treated as such,=20
i.e. I want to open("=E4=F6=FC"), not open("=E4=F6=FC".encode("utf-8"))=
=2E

One interesting tidbit is that MacOSX requires Unicode filenames to be =
in NFD.
I don't know whether anybody agreed on a standard normal form for Linux=
=2E

>  In this terrain, Windows has the cleaner API (they consider file nam=
es
>  as character strings, not as byte strings), so doing the right thing
>  is easier.
>
Byte strings are perfectly OK if they have a common encoding (meaning=20
UTF-8, in some accepted normal form). Character strings are bad if=20
their interpretation, or indeed their usability, changes with the=20
presense of some random environment variable / registry entry /=20
whatever. Under these constraints, calling it a character string vs.=20
a byte string, and/or using it as such, is a matter of programmers'=20
convenience.

--=20
Matthias Urlichs