[Python-Dev] unicode imports
"Martin v. Löwis"
martin at v.loewis.de
Mon Jun 19 22:31:45 CEST 2006
Kristján V. Jónsson wrote:
> I don't have specific information on the machines. We didn´t try
> very hard to get things to work with 2.3 since we simply assumed it
> would work automatically when we upgraded to a more mature 2.4. I
> could try to get more info, but it would be 2.3 specific. Have there
> been any changes since then?
Not in that respect, no.
> Note that it may not go into program files at all. Someone may want
> to install his modules in a folder named in the honour of his mother.
It's certainly possible to set this up in a way that it won't work,
on any localized version: just use a path name that isn't supported
in the ANSI code page. However, that should rarely happen: the
name of his mother should still be expressable in the ANSI code
page, if the system is setup correctly.
> Also, I really would like to see a general solution that doesn´t
> assume that the path name can somhow be transmuted to an ascii name.
(Please don't say ASCII here. Windows *A APIs are named that way
because Microsoft Windows has the notion of an "ANSI code page",
which, in turn, is just a code page indirection so some selected
code page meant to support the characters of the user's locale)
> Users are unpredictable. When you have a wide distribution , you
> come up against all kinds of problems (Currently we have around
> 500.000 users in china.) Also, relying on some locale settings is not
> acceptable.
Sure, but stating that doesn't really help. Code contributions
would help, but that part of Python has been left out of using
the *W API, because it is particularly messy to fix.
> Funny that no other platforms could benefit from a unicode import
> path. Does that mean that windows will reign supreme?
That is the case, more or less. Or, more precisely:
- On Linux, Solaris, and most other Unices, file names are bytes
on the system API, and are expected to be encoded in the user's
locale. So if your locale does not support a character, you
can't name a file that way, on Unix. There is a trend towards
using UTF-8 locales, so that the locale contains all Unicode
characters.
- On Mac OS X, all file names are UTF-8, always (unless the
user managed to mess it up), so you can have arbitrary
Unicode file names
That means that the approach of converting a Unicode sys.path
element to the file system encoding will always do the right
thing on Linux and OS X: the file system encoding will be
the locale's encoding on Linux, and will be UTF-8 on OS X.
It's only Windows which has valid file names that cannot
be represented in the current locale.
Regards,
Martin
More information about the Python-Dev
mailing list