[Catalog-sig] Re: [chiPy] Meeting followup- CPAN for Python?

Fri Dec 10 19:51:01 CET 2004

I'm copying the Catalog-SIG mailing list, since there's other interested 
people there, and if we want to do something it would be good to coordinate.

Ed Summers wrote:
> I also found the discussion of a python software repository to be one of
> the higlights of the evening. I'm sure having brian d foy there, who has
> been instrumental in building the cpan added to this. I for one am
> interested in volunteering some time on a project to develop a software 
> archive for python, but I'd like to find out why other efforts failed or 
> fizzled first.
>
> Is there resistance to a centralized archive of python software? I know
> that one advantage that python (ruby|java) has over perl is that many
> common things such as XML parsing, HTTP, SMTP, come as part of the core,
> whereas in Perl they are added on with the cpan utility. Perhaps this is
> why having a central repository hasn't been that important?

I don't think so.  Really, I don't see much resistence to this, I'd 
chaulk it entirely up to inertia.

People should look at what exists now:

   http://www.python.org/pypi
   http://sourceforge.net/projects/pypi/

Note that registration is automatic if you use distutils (just run 
"python setup.py register"), though distutils is not required to submit 
to PyPI.  If you want to discuss PyPI and related topics, there's a 
mailing list:

   http://www.python.org/sigs/catalog-sig/

Richard Jones is the author, and he's very willing to include other 
people in PyPI's development.  There's several PEPs noted on that page. 
  There's also a distutils-sig, which relates:

   http://www.python.org/sigs/distutils-sig/

Distutils has much, but not all, of the metadata needed.  setuptools is 
one extension meant to support more granular packages and dependencies:

   http://cvs.eby-sarna.com/PEAK/setuptools/

Frank Drake has also been doing some work with this, to distribute Zope 
3 in a series of smaller packages; I'm not sure where that work lives. 
Maybe Twisted people have also worked on this, as they wanted to split 
the Twisted installation up, but I haven't heard anything about that for 
a while.  Anyway, this might be a distraction, in that it's a much more 
complex problem than a CPAN-ish thing.

Here's a recent thread on the topic of CPAN/Python:
http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/85681185e372c9e1/a61ffe06820e3d88

I think there's an incremental way to get there:

* Track package names into PyPI.  For now these will serve as 
identifiers for installation and dependencies.  Conflicting package 
names already cause a lot of problems anyway, so it would be nice if 
they'd be unique.
* Add a field for package dependencies, based on those names.  Forget 
about version requirements for now.  Though in theory, since distutil 
setup.py files are just Python scripts, this should be extensible.
* Add a field for source download; maybe make it a dictionary, so you 
can give several links for different sources (e.g., source tarball/zip, 
rpm, windows installer, debian package, etc).
* Create a script that can query PyPI, get the link(s), then download 
it.  PyPI already has an XML-RPC interface, I believe.  Because of SF, 
the downloader has to be a little smart about the load balancing page in 
that case, but that's relatively easy.

These all seem pretty doable to me, and not very controversial.  Except 
the package name and dependency part, which isn't "controversial" per 
se, but has some subtle problems.  Oh, if only we used unambiguous names 
like com.colorstudy.sqlobject!  j/k

 From there, we can implement a package cacher that downloads packages 
based on that metadata.  We don't have to copy CPAN, in that it's okay 
if the cached files aren't the canonical location for the package, just 
a backup.  Perhaps we can add a field to the metadata that indicates if 
the author prefers for the cached version to be canonical (i.e., they 
don't have a good host).  Or something to "setup.py register" that can 
upload to a known server.  Or something to PyPI where mirrors can 
register that they have a file, and it can do URL rotation.  We can even 
do all of those, and see which one works best.  The point being: once we 
have the data we can start using it, but getting the data is the place 
to start.

Well, these are some of my ideas.  If people are interested in this in 
general, we could try to organize a little mini-sprint; a number of us 
would come together and try to bang it out.  We could try to schedule 
and coordinate this with Richard or other interested people over IRC.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org