[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

Mon Mar 30 16:21:44 CEST 2009

Guido van Rossum <guido at python.org> added the comment:

At the sprint, Andrew Svetlov, Martin von Loewis and I looked into this
a bit, and discovered that Andrew's Vista copy uses a Russian locale for
the filesystem encoding (despite using English as the language).  In
this locale, a-umlaut cannot be represented in the ANSI code page (which
has only 256 values), because the Russian locale uses those byte values
to represent Cyrillic.

As long as the import code (written in C) uses bytes in the filesystem
encoding to represent paths, this problem will remain.

Two possible solutions would be to switch to Brett's importlib, or to
change the import code to use wide characters everywhere (like
posixmodule.c).  Both are extremely risky and a lot of work, and I don't
expect we'll get to this for 3.1.

(In 2.x the same problem exists, but is perhaps less real because module
names are limited to ASCII.)

We also discovered another problem, which I'll report separately: the
*module* name is decoded to UTF8, while the *path* name uses the
filesystem encoding...

----------
nosy: +gvanrossum

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4352>
_______________________________________