[Python-Dev] Import and unicode: part two
a.badger at gmail.com
Wed Jan 26 06:33:56 CET 2011
On Wed, Jan 26, 2011 at 11:24:54AM +0900, Stephen J. Turnbull wrote:
> Toshio Kuratomi writes:
> > On Linux there's no defined encoding that will work; file names are just
> > bytes to the Linux kernel so based on people's argument that the convention
> > is and should be that filenames are utf-8 and anything else is
> > a misconfigured system -- python should mandate that its module filenames on
> > Linux are utf-8 rather than using the user's locale settings.
> This isn't going to work where I live (Tsukuba). At the national
> university alone there are hundreds of pre-existing *nix systems whose
> filesystems were often configured a decade or more ago. Even if the
> hardware and OS have been upgraded, the filesystems are usually
> migrated as-is, with OS configuration tweaks to accomodate them. Many
> of them use EUC-JP (and servers often Shift JIS). That means that you
> won't be able to read module names with ls, and that will make Python
> unacceptable for this purpose. I imagine that in Russia the same is
> true for the various Cyrillic encodings.
Sure ... but with these systems, neither read-modules-as-locale or
read-modules-as-utf-8 are a good solution to work, correct? Especially if
the OS does get upgraded but the filesystems with user data (and user
created modules) are migrated as-is, you'll run into situations where system
installed modules are in utf-8 and user created modules are shift-jis and so
something will always be broken.
The only way to make sure that modules work is to restrict them to ASCII-only
on the filesystem. But because unicode module names are seen as
a necessary feature, the question is which way forward is going to lead to
the least brokenness. Which could be locale... but from the python2
locale-related bugs that I get to look at, I doubt.
> I really don't think there is anything that can be done here except to
> warn people that "Kids, these stunts are performed by highly-trained
> professionals. Don't try this at home!" Of course they will anyway,
> but at least they will have been warned in sufficiently strong terms
> that they might pay attention and be able to recover when they run
> into bizarre import exceptions.
So on the subject of warnings... I think a reason it's better to pick an
encoding for the platform/filesystem rather than to use locale is because
people will get an error or a warning at the appropriate time if that's the
case -- the first time they attempt to create and import a module with
a filename that's not encoded in the correct encoding for the platform.
It's all very well to say: "We wrote in the documentation on
http://docs.python.org/distutils/introduction.html#Choosing-a-name that only
ASCII names should be used when distributing python modules" but if the
interpreter doesn't complain when people use a non-ASCII filename we all
know that they aren't going to look in the documentation; they'll try it and
if it works they'll learn that habit.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: not available
More information about the Python-Dev