[Python-Dev] Import and unicode: part two
Stephen J. Turnbull
stephen at xemacs.org
Wed Jan 26 09:58:36 CET 2011
Toshio Kuratomi writes:
> Sure ... but with these systems, neither read-modules-as-locale or
> read-modules-as-utf-8 are a good solution to work, correct?
Good solution, no, but I believe that read-modules-as-locale *should*
work to a great extent. AFAIK Python 3 reads Python programs as str
(ie, converting to Unicode -- if it doesn't, it *should*<wink>).
> Especially if the OS does get upgraded but the filesystems with
> user data (and user created modules) are migrated as-is, you'll run
> into situations where system installed modules are in utf-8 and
> user created modules are shift-jis and so something will always be
I don't know what you mean by "system-installed modules". If you're
talking about Python itself, it's not a problem. Python doesn't have
any Japanese-named modules in any encoding.
On the other hand, *everything* that involves scripting (shell
scripts, make, etc) related to those filesystems will be broken
*unless* the system, after upgrade but before going live, is converted
to have an appropriate locale encoding. So I don't really see a
The problem is portability across systems, and that is a problem that
only the third-party transports can really deal with. tar and unzip
need to be taught how to change file names to the locale, etc.
> The only way to make sure that modules work is to restrict them to ASCII-only
> on the filesystem. But because unicode module names are seen as
> a necessary feature, the question is which way forward is going to lead to
> the least brokenness. Which could be locale... but from the python2
> locale-related bugs that I get to look at, I doubt.
AFAICS this is going to be site-specific. End of story. Or, if you
IMHO, Python 2 locale bugs are unlikely to be a good guide to Python 3
locale bugs because in Python 2 most people just ignore locale and use
"native" strings (~= bytes in Python 3), and that typically "just
works". In Python 3 that just *doesn't* work any more because you get
a UnicodeError on import, etc, etc.
IMHO, YMMV, and all that. I know *of* such systems (there remain
quite a few here used by student and research labs), but the ones I
maintain were easy to convert to UTF-8 because I don't export file
systems (except my private files for my own use); everything is
mediated by Apache and Zope, and browsers are happy to cope if I
change from EUC-JP to UTF-8 and then flip the Apache switch to change
More information about the Python-Dev