[Distutils] Python people want CPAN and how the latter came about

Sun Dec 20 16:21:04 CET 2009

It'll be no secret to anyone reading this that there has been a lot of
discussion recently about a CPAN equivalent for Python, sparked by
Guido's "People want CPAN" post to the python-distutils-devel list. I
read a good chunk of the thread back in November and have been meaning
to add my blurb since. Take it with a grain of salt, though. My
knowledge of the language is weak, my knowledge of the community is
weaker. I'm from the Perl crowd[1] and would simply like to share my
late comer's view of how the CPAN came about. 

My thesis is that the huge success of the CPAN has been facilitated by
two factors[2]. The first is simplicity. When Jarkko Hietaniemi
originally came up with it, the CPAN was (and mostly still is) just an
FTP archive with a by-author directory structure that is mirrored many
times. Everything else is built on this flexible foundation and has
grown over time. The CPAN specifically does NOT have an official web
service or any kind of development platform. Apart from the directory
structure of the CPAN, the only other key ingredient was the "Perl
Authors Upload SErver" (PAUSE) that handles credentials of the authors,
permissions for namespaces, and serves as a single entry-point for
uploads to the CPAN. PAUSE scans incoming distributions for meta
information and generates in index of modules (namespaces/classes) and
distributions that is itself distributed via the simple CPAN mechanism.

Let me repeat: Everything else is just sugar on top. Specifically,
everything else is sugar provided by *third parties*. Andreas König
wrote and still maintains PAUSE and the CPAN.pm client[3]. Later, Jos
Boumans set out to write the CPANPLUS client. Graham Barr wrote and
still maintains the search.cpan.org website. Randy Kobes wrote the
similar but equally non-official kobesearch.cpan.org. Many other
significant pieces of the infrastructure[4] have been written and are
maintained by other people who did cooperate with each other but never
had to. By virtue of the simple design and the (somewhat limited)
published meta-data in form of the simple module index, everyone had
the opportunity to write tools that interface with it. There was no
need to "get it all right" from the get-go. Things evolved and we now
have a best-of-breed. The various services by various people are
loosely intertwined[5].

There is also very little regulation on what is uploaded to CPAN, but
curiously little abuse. I think that is because the majority of people
who are willing to share their work with others free of charge aren't
the type who'd want to crap on other people's front yard. 

If I was to set up something like CPAN today, I'd put in somewhat more
design, simply because we have learned from the past fifteen years of
operation. I'd specifically put in some more work on namespace and
distribution-level meta data[6]. But I'd make very sure to keep it all
relatively simple. By no means would I want to run any sort of
elaborate web service beyond the authentication and authorization that
the PAUSE provides. This eliminates some of the reliance on a very,
very strong single party to provide initial implementation, hosting and
maintenance for what is a very significant piece of software. 

My firm belief is that the second most important factor to the success
of the CPAN is people. There are some individuals who have managed
herculean amounts of work and have shown incredible dedication over
years. But it's the combination of a lot of people's work that is more
than its sum. I'd say the core of the CPAN toolchain gang is not that
large. Depending on where you draw the line, there are maybe 10-50 of
them. Not all, but many of these people have known each other
personally for years from attending the many YAPCs and Perl workshops.
Reading the "People want CPAN" thread, it seemed to me that folks are
fighting each other quite a bit and not always only on technical
grounds. On the other hand, it's hard to imagine fighting with somebody
as friendly and welcoming as, for example, Andreas König. Meeting
in person helps this tremendously. Discover the common goals and
agree on the means over a beer. I'm certain that is happening in the
Python world. But looking at the prices of, say, attending PyCon,
I wonder whether such an event can have the same spirit and community
building effect as a YAPC or even the smaller workshops[7]. 

So if anyone was to accept a single concrete suggestion from me, it'd
be: "Enable the right heads to get together over a beer." It is the
bonds between individuals that can make it all work well. The success
of the CPAN is due to cultural aspects at least as much as due to its
design and technical merits. 

Please keep in mind that this is only my personal opinion. Thanks for
reading. I hope this can add some perspective.

Best regards,
Steffen Mueller 

PS: Please keep me in CC of relevant discussion. I'm not subscribed to this list.

[1] So why do I think I have anything to contribute to this discussion?
I'm a regular contributor to the CPAN and a very modest contributor to
the perl core distribution. I maintain over a hundred Perl modules, am
one of the bunch of PAUSE admins who try to keep the chaos sane, and
have been involved in Perl & CPAN toolchain maintenance to some degree.

[2] You could argue that having a CTAN to take inspiration from helped,
too, but I'm too young to know the exact stage of development of the
CTAN in 1995. 

[3] Recently, the tireless David Golden has been doing a lot
maintenance, too. 

[4] As another example, I'd like to point out the CPAN testers as one
of the most important bits of the CPAN infrastructure. The setup is as
simple as that of CPAN itself, but to some degree, it hasn't aged as
well. It its core, the CPAN testers is just a mailing list to which
anybody can send test reports. This is done by volunteers who spend
their and their computers' time on testing CPAN distributions, but
anybody can easily set up their CPAN client to send test reports while
installing modules. The wealth of reports (I think we hit 6 million
recently) has become a strain on the software that runs the mailing
list archive. People are working on modernization of the setup. The
example demonstrates how decentralization works well. There is no
*official* cluster of computers that automatically tests new
submissions. It's all volunteers who test on their hardware (usually
automatically). 

[5] Does that ring a bell? Sounds a little "web 2.0"
with less polish to me. From fifteen years ago. 

[6] There is a working group for better meta information in CPAN
distributions, initiated and carried forward by David Golden. Most of
the discussion is done. There is a consensus on a large fraction of the
proposals. A draft specification is forthcoming. Implementation in the
CPAN toolchain will likely follow the draft shortly and will be part of
perl 5 release 14. (Release 12 is already feature-frozen.) 

[7] It's not that long ago that I was a student. $225 student rate for
attending a conference? Wouldn't have been able to afford it. The
standard individual rate is also ~3.5 times as high for PyCon as it is
for YAPC. I think that's a harmful barrier to entry.