
On Mon, Nov 12, 2012 at 3:20 PM, Toshio Kuratomi a.badger@gmail.com wrote:
Or at least warn("Your Unicode is broken"); in fact, just put that in
site.py
unconditionally.
If python itself adds that to site.py, that would be great. But individual sites adding things to site.py only makes python code written at one site non-portable.
It is a joke. Python would just print "Your Unicode is broken" on startup, just to let you know, regardless of your platform or LOCALE.
However remember that a non-ASCII pypi name ☃ could still be just "import
snowman". Only the .dist-info directory ☃-1.0.0.dist-info would
necessarily
contain the higher Unicode characters.
<nod> I wasn't thinking about that. If you specify that the metadata directories (if they contain the unicode characters) must be encoded in utf-8 (or at least, must be in a specific encoding on a specific platform), then that would work. Be sure to specify the encoding and use it explicitly, when decoding filenames rather than the implicit d4ecoding which relies on the locale, though (I advise having unittests where the locale is set to something non-utf-8 (C locale works well) to test this or someone who doesn't remember this conversation will make a mistake someday). If you rely on the implicit conversion with locale, you'll eventually end up back in the mess of having bytes that you don't know what to do with.
I will keep the - and document the - to _ folding convention. - turns
into _
when going into a filename, and _ turns back into - when parsed out of a filename.
Cool. Thanks.
The alternative to putting the metadata in the filename which btw isn't
that
big of a problem, is to have indexed metadata. IIUC apt-get and yum work
this
way and the filename does not matter at all. The tradeoff is of course
that you
have to generate the index. The simple index is a significant
convenience of
easy_install derived systems.
<nod>. I've liked the idea of putting metadata about all installed modules into a separate index. It makes possible writing a new import mechanism that uses the index to more efficiently load of modules on systems with large sys.path's and make mulitple versions of a module on a system easier to implement.
However, there are some things to consider:
I was actually thinking of the server (pypi) side.
It would also be worthwhile to define an install-side hook, minimally "packaging.reindex()", or "reindex(list of changed packages)". By default it would do nothing because the default implementation would look at all the .dist-info directories every time, but you could plug in a more complicated implementation. It would be, by design, less flexible than the current "anything that has an info directory on the path is installed automatically" system.