On Mon, Nov 12, 2012 at 3:20 PM, Toshio Kuratomi <a.badger@gmail.com> wrote:
> Or at least warn("Your Unicode is broken"); in fact, just put that in site.py
> unconditionally.
>
If python itself adds that to site.py, that would be great.  But individual
sites adding things to site.py only makes python code written at one site
non-portable.

It is a joke. Python would just print "Your Unicode is broken" on startup, just to let you know, regardless of your platform or LOCALE.
 
 > However remember that a non-ASCII pypi name ☃ could still be just "import
> snowman". Only the .dist-info directory ☃-1.0.0.dist-info would necessarily
> contain the higher Unicode characters.
>
<nod>  I wasn't thinking about that.  If you specify that the metadata
directories (if they contain the unicode characters) must be encoded in
utf-8 (or at least, must be in a specific encoding on a specific platform),
then that would work.  Be sure to specify the encoding and use it
explicitly, when decoding filenames rather than the implicit d4ecoding which
relies on the locale, though (I advise having unittests where the locale is
set to something non-utf-8 (C locale works well) to test this or someone who
doesn't remember this conversation will make a mistake someday).  If you
rely on the implicit conversion with locale, you'll eventually end up back
in the mess of having bytes that you don't know what to do with.

> I will keep the - and document the - to _ folding convention. - turns into _
> when going into a filename, and _ turns back into - when parsed out of a
> filename.
>
Cool.  Thanks.

> The alternative to putting the metadata in the filename which btw isn't that
> big of a problem, is to have indexed metadata. IIUC apt-get and yum work this
> way and the filename does not matter at all. The tradeoff is of course that you
> have to generate the index. The simple index is a significant convenience of
> easy_install derived systems.
>
<nod>.  I've liked the idea of putting metadata about all installed modules
into a separate index.  It makes possible writing a new import mechanism
that uses the index to more efficiently load of modules on systems with
large sys.path's and make mulitple versions of a module on a system easier
to implement.

However, there are some things to consider:

I was actually thinking of the server (pypi) side.

It would also be worthwhile to define an install-side hook, minimally "packaging.reindex()", or "reindex(list of changed packages)". By default it would do nothing because the default implementation would look at all the .dist-info directories every time, but you could plug in a more complicated implementation. It would be, by design, less flexible than the current "anything that has an info directory on the path is installed automatically" system.