[Python-Dev] Import and unicode: part two

Victor Stinner victor.stinner at haypocalc.com
Wed Jan 26 10:40:34 CET 2011

Le lundi 24 janvier 2011 à 19:26 -0800, Toshio Kuratomi a écrit :
> Why not locale:
> * Relying on locale is simply not portable. (...)
> * Mixing of modules from different locales won't work. (...)

I don't understand what you are talking about.

When you import a module, the module name becomes a filename. On
Windows, you can reuse the Unicode name directly as a filename. On the
other OSes, you have to encode the name to filesystem encoding. During
Python 3.2 development, we tried to be able to use a filesystem encoding
different than the locale encoding (PYTHONFSENCODING environment
variable): but it doesn't work simply because Python is not alone in the
OS. Except Python, all programs speak the same "language": the locale
encoding. Let's try to give you an example: if create a module with a
name encoded to UTF-8, your file browser will display mojibake.

I don't understand the relation between the local filesystem encoding
and the portability. I suppose that you are talking about the
distribution of a module to other computers. Here the question is how
the filenames are stored during the transfer. The user is free to use
any tool, and try to find a tool handling Unicode correctly :-) But it's
no more the Python problem.

Each computer uses a different locale encoding. You have to use it to
cooperate with other programs and avoid mojibake. But I don't understand
why you write that "Mixing of modules from different locales won't
work". If you use a tool storing filenames in your locale encoding (eg.
TAR file format... and sometimes the ZIP format), the problem comes from
your tool and you should use another tool.

I created http://bugs.python.org/issue10972 to workaround ZIP tools
supposing that ZIP files use the locale encoding instead of cp497: this
issue adds an option to force the usage of the Unicode flag (and so
store filenames to UTF-8). Even if initially, I created the issue to
workaround a bootstrap issue (#10955).


More information about the Python-Dev mailing list