[Python-Dev] Import and unicode: part two
Toshio Kuratomi
a.badger at gmail.com
Tue Jan 25 04:26:09 CET 2011
On Thu, Jan 20, 2011 at 03:27:08PM -0500, Glyph Lefkowitz wrote:
>
> On Jan 20, 2011, at 11:46 AM, Guido van Rossum wrote:
> Same here. *Most* code will never be shared, or will only be shared
> between users in the same community. When it goes wrong it's also a
> learning opportunity. :-)
>
>
> Despite my usual proclivity for being contrarian, I find myself in agreement
> here. Linux users with locales that don't specify UTF-8 frankly _should_ have
> to deal with all kinds of nastiness until they can transcode their filesystems.
> MacOS and Windows both have a "right" answer here and your third-party tools
> shouldn't create mojibake in your filenames.
>
However, if this is the consensus, it makes a lot more sense to pick utf-8
as *the* encoding for python module filenames on Linux.
Why UTF-8:
* UTF-8 can cover the whole range of unicode whereas most (all?) other
locale friendly encodings cannot.
* UTF-8 is becoming a standard for Linux distributions whether or not Linux
users are adopting it.
* Third party tools are gaining support for UTF-8 even when they aren't
gaining support for generic encodings (If I read the spec on zip
correctly, this is actually what's happening there).
Why not locale:
* Relying on locale is simply not portable. If nothing prevents people from
distributing a unicode filename then they will go ahead and do so. If
the result works (say, because it's utf-8 and 80% of the Linux userbase is
using utf-8) then it will get packaged and distributed and people won't
know that it's a problem until someone with a non-utf-8 locale decids to
use it.
* Mixing of modules from different locales won't work. Suppose that the
system python installs the previous module. The local site has other
modules that it has installed using a different filename encoding.
The users at the site will find that either one or hte other of the two
modules won't work.
* Because of the portability problems you have no choice but to tell people
not to distribute python modules with non-ASCII names. This makes the use
of unicode names second class indefintely (until the kernel devs decide
that they're wrong to not enforce a filesystem encoding or Linux becomes
irrelevant as a platform).
* If you can pick a set of encodings that are valid (utf-8 for Linux and
MacOS, wide unicode for windows [I get the feeling from other parts of the
conversation that Windows won't be so lucky, though]) tools to convert
python names become easier to write. If you restrict it far enough, you
could even write tools/importers that automatically do the detection.
PS: Sorry for not replying immediately, the team I'm on is dealing with an
issue at my work and I'm also preparing for a conference later this week.
-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110124/f76b1630/attachment.pgp>
More information about the Python-Dev
mailing list