[Python-Dev] unicode imports
"Martin v. Löwis"
martin at v.loewis.de
Mon Jun 19 22:31:45 CEST 2006
Kristján V. Jónsson wrote:
> I don't have specific information on the machines. We didn´t try
> very hard to get things to work with 2.3 since we simply assumed it
> would work automatically when we upgraded to a more mature 2.4. I
> could try to get more info, but it would be 2.3 specific. Have there
> been any changes since then?
Not in that respect, no.
> Note that it may not go into program files at all. Someone may want
> to install his modules in a folder named in the honour of his mother.
It's certainly possible to set this up in a way that it won't work,
on any localized version: just use a path name that isn't supported
in the ANSI code page. However, that should rarely happen: the
name of his mother should still be expressable in the ANSI code
page, if the system is setup correctly.
> Also, I really would like to see a general solution that doesn´t
> assume that the path name can somhow be transmuted to an ascii name.
(Please don't say ASCII here. Windows *A APIs are named that way
because Microsoft Windows has the notion of an "ANSI code page",
which, in turn, is just a code page indirection so some selected
code page meant to support the characters of the user's locale)
> Users are unpredictable. When you have a wide distribution , you
> come up against all kinds of problems (Currently we have around
> 500.000 users in china.) Also, relying on some locale settings is not
Sure, but stating that doesn't really help. Code contributions
would help, but that part of Python has been left out of using
the *W API, because it is particularly messy to fix.
> Funny that no other platforms could benefit from a unicode import
> path. Does that mean that windows will reign supreme?
That is the case, more or less. Or, more precisely:
- On Linux, Solaris, and most other Unices, file names are bytes
on the system API, and are expected to be encoded in the user's
locale. So if your locale does not support a character, you
can't name a file that way, on Unix. There is a trend towards
using UTF-8 locales, so that the locale contains all Unicode
- On Mac OS X, all file names are UTF-8, always (unless the
user managed to mess it up), so you can have arbitrary
Unicode file names
That means that the approach of converting a Unicode sys.path
element to the file system encoding will always do the right
thing on Linux and OS X: the file system encoding will be
the locale's encoding on Linux, and will be UTF-8 on OS X.
It's only Windows which has valid file names that cannot
be represented in the current locale.
More information about the Python-Dev