At 11:40 PM 9/23/2005 -0400, Fred Drake wrote:
On 9/23/05, Richard Jones richardjones@optushome.com.au wrote:
Distutils metadata capture is implemented in the Python core. We would
want to
implement any name restrictions there, surely? Otherwise people only get an error when attempting to use setuptools or register with PyPI, which
would be
just annoying.
The use of distutils should not imply the use of PyPI. Perhaps we'd want distutils to issue a warning when building a distribution if the naming conventions weren't acceptable, but that's the most we'd want. That should be something that could easily be turned off for a site or an individual.
Not only that, but I'm not suggesting we ban those characters from names. I'm suggesting merely that we strip them in a uniform way. The error message would be "somebody already has a project with a name that's too similar to yours", not "you have unacceptable characters in your project name". :)
I'm suggesting, in other words, that "Foo*Bar" and "Foo!Bar" simply not be considered unique project names, not that whichever project registers the name first can't use it with funky punctuation in PKG-INFO and display it on their PyPI page that way. (I would also suggest that we clarify the rules for determining project name uniqueness and recommend people follow them for simplicity's sake, of course.)
I'm also suggesting that if somebody goes to the URL "/pypi/foo--bar", it would still pull up the "Foo*Bar" project if that's the one that's registered, because canonicalizing 'foo--bar' should yield the same unique key as canonicalizing 'Foo*Bar'. (This is particularly nice for EasyInstall users, since it wouldn't need to fall back to pulling down the entire index to do a case-insensitive search when they don't match someone's CreativeCAPS in a project name.)
In other words, all user inputs (URL or otherwise) should be normalized for key storage and lookup, distinct from the human-readable name of the package. (Setuptools implements this for eggs by having distinct "project_name" and "key" attributes.)
This approach has a few important features:
1. It can be implemented without renaming existing packages, unless there are actual conflicts in PyPI today
2. It can be implemented without any need for co-operation from package authors, because it's strictly a PyPI-side change.
3. It allows authors to fully express their creativity in naming
4. It allows end-users to ignore the authors' creativity :)
The principal downside, of course, is that it's probably not a minor change to the PyPI code base, with respect to the "two names" issue, or with respect to how lookups are done. Which is why I haven't been hounding Richard to do it. Well, maybe just a little. ;)