[Distutils] Python people want CPAN and how the latter came about

Sat Dec 26 21:59:40 CET 2009

Hi Martin,

Martin v. Löwis <martin <at> v.loewis.de> writes:
> Maybe it's a terminology matter. "I" have to "authorize": who is "I"?
> In PyPI, no person ever authorizes access. Yet, you still cannot upload
> newer versions of popular packages.
> 
> Package names are registered on a first-come first-served basis. You
> could register a popular package if you were the first one to do so.
> However, all the popular packages are already registered, and then only
> the person originally performing the registration may upload newer
> versions (strictly speaking, that person can also delegate that
> permission to others, but that is besides the point that the archive
> operators will never need to authorize anything).

You're right. It's terminology. This is exactly how PAUSE works.

I had a longer reply to Lennart's post written up when firefox bombed out. I
should have subscribed to the list instead :/

> > Now, on CPAN, I *can* upload anything even if not authorized to do so. It
> > just won't be part of the official indexes if I upload a new version of the
> > database interface DBI.
> 
> In PyPI, you can upload the popular package under a different name. It
> will be indexed and all, but people still may not use it because of the
> different name.

Hmm. If you replace "different name" here with "different package name(s)", then
it's the same for CPAN. Simply renaming the distribution, however, doesn't work
there.

> > That we do out meta data stuff on package/namespace/class names as opposed to
> > distribution names has the huge benefit of interoperability between
> > distributions.
> 
> I think you misunderstood the Python term "distribution" here (or I
> misunderstand the point you make). A "distribution" is a tar file (or
> such); it's what library authors distribute. There can't be
> "interoperability" between them, at least not in the way I understand
> "interoperability".

Maybe I wasn't clear. But in the end, I think the misunderstanding comes down
to a difference between Perl and Python: Perl mixes class names and namespaces
(=> class hierarchies instead of namespaces as a language feature) whereas
Python has them separate.

By distribution, I also meant tarballs. Interoperability in the sense of using
library A, B, and C all in the same project (be it library D or an application).
If you do that, you need to make sure the fully resolved class names (including
their namespace in the Python case) is unique between those libraries. Otherwise
there'll be a clash.

> I think the point here was that "we" see the advantage that CPAN imposes
> with the namespace registrations primarily as theoretical. Yes, it does
> prevent two people putting code into Config::Parser, and yes, in theory,
> it may be that they do the same in Python with PyPI. In practice, that
> is never a problem - there are so many names to chose from, and if you
> do happen to conflict with somebody else's naming choice, AND there is
> indeed interest in using both packages simultaneously, your users will
> ask you to rename your package. However, that happens so rarely if at
> all that it hasn't been a real concern.

Nit: Renaming packages is really only possible if they have no users. Also, all
authors think they wrote the ONE TRUE config parser. So they must have
Config::Parser.

I humbly think on this point, IF Python namespaces don't do the disambiguation
for you anyway, PyPI gets it wrong.

On the other hand, if you think in monolithic, large libraries as opposed to
small, highly specialized and reusable components that make for the bulk of
CPAN, this may not be immediately obvious.

You say "namespace registration". I'm not sure what you're refering to, but 99%
of CPAN is just the regular first-come auto-registration. The manual
registration that is still possible is a mere relic.

> If you have malicious users, and unsuspecting users download and run
> their code, no namespacing mechanism can stop them, neither in CPAN nor
> in PyPI. Malicious users would be able to bypass any checks that are
> performed, and experienced users, code review etc. needs to discover
> that.

That's true. But lowering the barrier by making it easy to upload a new package
that has the same name as a popular one but a higher version number is silly.
But that is academic. Neither CPAN or PyPI make this mistake.

> New or stupid users may accidentally create colliding names. However,
> in Python, packages aren't called "db", but, say, "psycopg2", "Twisted",
> "django", or "zope". Chances are fairly low that new users accidentally
> come up with such a name.

DB was a pathological example. It's the class name of the perl-internal debugger
but lends itself well to different interpretation.

Best regards,
Steffen