[Distutils] [Catalog-sig] setuptools or PyPI problem...?

Phillip J. Eby pje at telecommunity.com
Sat Sep 24 07:45:01 CEST 2005


At 11:40 PM 9/23/2005 -0400, Fred Drake wrote:
>On 9/23/05, Richard Jones <richardjones at optushome.com.au> wrote:
> > Distutils metadata capture is implemented in the Python core. We would 
> want to
> > implement any name restrictions there, surely? Otherwise people only get an
> > error when attempting to use setuptools or register with PyPI, which 
> would be
> > just annoying.
>
>The use of distutils should not imply the use of PyPI.  Perhaps we'd
>want distutils to issue a warning when building a distribution if the
>naming conventions weren't acceptable, but that's the most we'd want.
>That should be something that could easily be turned off for a site or
>an individual.

Not only that, but I'm not suggesting we ban those characters from 
names.  I'm suggesting merely that we strip them in a uniform way.  The 
error message would be "somebody already has a project with a name that's 
too similar to yours", not "you have unacceptable characters in your 
project name".  :)

I'm suggesting, in other words, that "Foo*Bar" and "Foo!Bar" simply not be 
considered unique project names, not that whichever project registers the 
name first can't use it with funky punctuation in PKG-INFO and display it 
on their PyPI page that way.  (I would also suggest that we clarify the 
rules for determining project name uniqueness and recommend people follow 
them for simplicity's sake, of course.)

I'm also suggesting that if somebody goes to the URL "/pypi/foo--bar", it 
would still pull up the "Foo*Bar" project if that's the one that's 
registered, because canonicalizing 'foo--bar' should yield the same unique 
key as canonicalizing 'Foo*Bar'.  (This is particularly nice for 
EasyInstall users, since it wouldn't need to fall back to pulling down the 
entire index to do a case-insensitive search when they don't match 
someone's CreativeCAPS in a project name.)

In other words, all user inputs (URL or otherwise) should be normalized for 
key storage and lookup, distinct from the human-readable name of the 
package.  (Setuptools implements this for eggs by having distinct 
"project_name" and "key" attributes.)

This approach has a few important features:

1. It can be implemented without renaming existing packages, unless there 
are actual conflicts in PyPI today

2. It can be implemented without any need for co-operation from package 
authors, because it's strictly a PyPI-side change.

3. It allows authors to fully express their creativity in naming

4. It allows end-users to ignore the authors' creativity  :)

The principal downside, of course, is that it's probably not a minor change 
to the PyPI code base, with respect to the "two names" issue, or with 
respect to how lookups are done.  Which is why I haven't been hounding 
Richard to do it.  Well, maybe just a little.  ;)



More information about the Distutils-SIG mailing list