[Python-Dev] unicode imports

"Martin v. Löwis" martin at v.loewis.de
Mon Jun 19 22:31:45 CEST 2006

Kristján V. Jónsson wrote:
> I don't have specific information on the machines.  We didn´t try
> very hard to get things to work with 2.3 since we simply assumed it
> would work automatically when we upgraded to a more mature 2.4. I
> could try to get more info, but it would be 2.3 specific.  Have there
> been any changes since then?

Not in that respect, no.

> Note that it may not go into program files at all.  Someone may want
> to install his modules in a folder named in the honour of his mother.

It's certainly possible to set this up in a way that it won't work,
on any localized version: just use a path name that isn't supported
in the ANSI code page. However, that should rarely happen: the
name of his mother should still be expressable in the ANSI code
page, if the system is setup correctly.

> Also, I really would like to see a general solution that doesn´t
> assume that the path name can somhow be transmuted to an ascii name.

(Please don't say ASCII here. Windows *A APIs are named that way
 because Microsoft Windows has the notion of an "ANSI code page",
 which, in turn, is just a code page indirection so some selected
 code page meant to support the characters of the user's locale)

> Users are unpredictable.  When you have a wide distribution  , you
> come up against all kinds of problems (Currently we have around
> 500.000 users in china.) Also, relying on some locale settings is not
> acceptable.

Sure, but stating that doesn't really help. Code contributions
would help, but that part of Python has been left out of using
the *W API, because it is particularly messy to fix.

> Funny that no other platforms could benefit from a unicode import
> path.  Does that mean that windows will reign supreme?

That is the case, more or less. Or, more precisely:
- On Linux, Solaris, and most other Unices, file names are bytes
  on the system API, and are expected to be encoded in the user's
  locale. So if your locale does not support a character, you
  can't name a file that way, on Unix. There is a trend towards
  using UTF-8 locales, so that the locale contains all Unicode
- On Mac OS X, all file names are UTF-8, always (unless the
  user managed to mess it up), so you can have arbitrary
  Unicode file names

That means that the approach of converting a Unicode sys.path
element to the file system encoding will always do the right
thing on Linux and OS X: the file system encoding will be
the locale's encoding on Linux, and will be UTF-8 on OS X.

It's only Windows which has valid file names that cannot
be represented in the current locale.


More information about the Python-Dev mailing list