
I consider the limitation of package names to non-ascii to be a blessing in disguise. In python3, unicode module names are possible but not portable between systems. This is because the non-ascii module names inside of a python file are abstract text but the representation on the filesystem is whatever the user's locale is. The consensus on python-dev when this was brought up seemed to be that using non-ascii in your local locale was important for learning to use python. But distributing non-ascii modules to other people was a bad idea. (If you have the attention span for long threads, http://mail.python.org/pipermail/python-dev/2011-January/107467.html Note that the threading was broken several times but the subject line stayed the same.)
Description of the non-ascii module problem for people who want a summary:
I have a python3 program that has:: #!/usr/bin/python3 -tt # -*- coding: utf-8 -*- import café café.do_something()
python3 reads this file in and represents café as an abstract text type because I wrote it using utf-8 encoding and it can therefore decode the file's contents to its internal representation. However it then has to find the café module on disk. In my environment, I have LC_ALL=en_US.utf8. python3 finds the file café.py and uses that to satisfy the import.
However, I have a colleague that does work with me. He has access to my program over a shared filesystem (or distributed to him via a git checkout or copied via an sdist, etc). His locale uses latin-1 (ISO8859-1) as his encoding (For instance, LC_ALL=en_US.ISO8859-1). When he runs my program, python3 is still able to read the application file itself (due to the piece of the file that specifies it's encoded in utf-8) but when it searches for a file to satisfy café on the disk it runs into probelsm because the café.py filename is not encoded using latin-1.
Horrifying. All codecs that are not utf-8 should be banned, except on Windows. Or at least warn("Your Unicode is broken"); in fact, just put that in site.py unconditionally.
However remember that a non-ASCII pypi name ☃ could still be just "import snowman". Only the .dist-info directory ☃-1.0.0.dist-info would necessarily contain the higher Unicode characters.
I will keep the - and document the - to _ folding convention. - turns into _ when going into a filename, and _ turns back into - when parsed out of a filename.
The alternative to putting the metadata in the filename which btw isn't that big of a problem, is to have indexed metadata. IIUC apt-get and yum work this way and the filename does not matter at all. The tradeoff is of course that you have to generate the index. The simple index is a significant convenience of easy_install derived systems.
Daniel Holth