[Distutils] Changing the separator from - to ~ and allow all Unicode alphanumerics in package names...

Daniel Holth dholth at gmail.com
Mon Nov 12 20:34:14 CET 2012

> I consider the limitation of package names to non-ascii to be a blessing in
> disguise.  In python3, unicode module names are possible but not portable
> between systems.  This is because the non-ascii module names inside of a
> python
> file are abstract text but the representation on the filesystem is whatever
> the user's locale is.  The consensus on python-dev when this was brought up
> seemed to be that using non-ascii in your local locale was important for
> learning to use python.  But distributing non-ascii modules to other people
> was a bad idea.  (If you have the attention span for long threads,
> http://mail.python.org/pipermail/python-dev/2011-January/107467.html
> Note that the threading was broken several times but the subject line
> stayed
> the same.)
> Description of the non-ascii module problem for people who want a summary:
> I have a python3 program that has::
>   #!/usr/bin/python3 -tt
>   # -*- coding: utf-8 -*-
>   import café
>   café.do_something()
> python3 reads this file in and represents café as an abstract text type
> because I wrote it using utf-8 encoding and it can therefore decode the
> file's contents to its internal representation.  However it then has to
> find
> the café module on disk.  In my environment, I have LC_ALL=en_US.utf8.
> python3 finds the file café.py and uses that to satisfy the import.
> However, I have a colleague that does work with me.  He has access to my
> program over a shared filesystem (or distributed to him via a git checkout
> or copied via an sdist, etc).  His locale uses latin-1 (ISO8859-1) as his
> encoding (For instance, LC_ALL=en_US.ISO8859-1).  When he runs my program,
> python3 is still able to read the application file itself (due to the piece
> of the file that specifies it's encoded in utf-8) but when it searches for
> a file to satisfy café on the disk it runs into probelsm because the
> café.py
> filename is not encoded using latin-1.

Horrifying. All codecs that are not utf-8 should be banned, except on
Windows. Or at least warn("Your Unicode is broken"); in fact, just put that
in site.py unconditionally.

However remember that a non-ASCII pypi name ☃ could still be just "import
snowman". Only the .dist-info directory ☃-1.0.0.dist-info would necessarily
contain the higher Unicode characters.

I will keep the - and document the - to _ folding convention. - turns into
_ when going into a filename, and _ turns back into - when parsed out of a

The alternative to putting the metadata in the filename which btw isn't
that big of a problem, is to have indexed metadata. IIUC apt-get and yum
work this way and the filename does not matter at all. The tradeoff is of
course that you have to generate the index. The simple index is a
significant convenience of easy_install derived systems.

Daniel Holth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20121112/1c0a2229/attachment-0001.html>

More information about the Distutils-SIG mailing list