[Python-Dev] Import and unicode: part two

Stephen J. Turnbull stephen at xemacs.org
Wed Jan 26 03:24:54 CET 2011

Toshio Kuratomi writes:

 > On Linux there's no defined encoding that will work; file names are just
 > bytes to the Linux kernel so based on people's argument that the convention
 > is and should be that filenames are utf-8 and anything else is
 > a misconfigured system -- python should mandate that its module filenames on
 > Linux are utf-8 rather than using the user's locale settings.

This isn't going to work where I live (Tsukuba).  At the national
university alone there are hundreds of pre-existing *nix systems whose
filesystems were often configured a decade or more ago.  Even if the
hardware and OS have been upgraded, the filesystems are usually
migrated as-is, with OS configuration tweaks to accomodate them.  Many
of them use EUC-JP (and servers often Shift JIS).  That means that you
won't be able to read module names with ls, and that will make Python
unacceptable for this purpose.  I imagine that in Russia the same is
true for the various Cyrillic encodings.

I really don't think there is anything that can be done here except to
warn people that "Kids, these stunts are performed by highly-trained
professionals.  Don't try this at home!"  Of course they will anyway,
but at least they will have been warned in sufficiently strong terms
that they might pay attention and be able to recover when they run
into bizarre import exceptions.

Oh, yeah, don't forget to apply Victor's patch, which allows Python to
keep the promises it can make about consistency.<wink>

More information about the Python-Dev mailing list