[Python-Dev] Import and unicode: part two

Glyph Lefkowitz glyph at twistedmatrix.com
Thu Jan 20 21:27:08 CET 2011


On Jan 20, 2011, at 11:46 AM, Guido van Rossum wrote:

> On Thu, Jan 20, 2011 at 5:16 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On Thu, Jan 20, 2011 at 10:08 PM, Simon Cross
>> <hodgestar+pythondev at gmail.com> wrote:
>>> I'm changing my vote on this to a +1 for two reasons:
>>> 
>>> * Initially I thought this wasn't supported by Python at all but I see
>>> that currently it is supported but that support is broken (or at least
>>> limited to UTF-8 filesystem encodings). Since support is there, might
>>> as well make it better (especially if it tidies up the code base at
>>> the same time).
>>> 
>>> * I still don't think it's a good idea to give modules non-ASCII names
>>> but the "consenting adults" approach suggests we should let people
>>> shoot themselves in the foot if they believe they have good reason to
>>> do so.
>> 
>> I'm also +1 on this for the reasons Simon gives.
> 
> Same here. *Most* code will never be shared, or will only be shared
> between users in the same community. When it goes wrong it's also a
> learning opportunity. :-)

Despite my usual proclivity for being contrarian, I find myself in agreement here.  Linux users with locales that don't specify UTF-8 frankly _should_ have to deal with all kinds of nastiness until they can transcode their filesystems.  MacOS and Windows both have a "right" answer here and your third-party tools shouldn't create mojibake in your filenames.

However, I feel that we should not necessarily be making non-ASCII programmers second-class citizens, if they are to be supported at all.  The obvious outcome of the current regime is, if you want your code to work in the wider world, you have to make everything ASCII, so non-ASCII programmers have to do a huge amount of extra work to prepare their stuff for distribution.  As an english speaker I'd be happy about that, but as a person with a lot of Chinese in-laws, it gives me pause.

There is a difference between sharing code for inspection and editing (where a little codec pain is good for the soul: set your locale to UTF-8 and forget it already!) and sharing code so that a (non-programming) user can just run it.  If I can write software in English and distribute it to Chinese people, fair's fair, they should be able to write it in chinese and have it work on my computer.

To support the latter, could we just make sure that zipimport has a consistent, non-locale-or-operating-system-dependent interpretation of encoding?  That way a distributed egg would be importable from a zipfile regardless of how screwed up the distribution target machine's filesystem is.  (And this is yet more motivation for distributors to set zip_safe=True.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110120/0c1b56a2/attachment-0001.html>


More information about the Python-Dev mailing list