[Catalog-sig] Why so many zc.buildout versions?

Jim Fulton jim at zope.com
Tue Jul 10 16:32:10 CEST 2007


You raise a really good point, which is especially relevant in light  
of pypi performance issues and discussions.

I'm copying the distutils and catalog sigs to get some wider  
discussion. I apologize for the cross posting.

I'm beginning to wonder about the strategy that setuptools uses, or  
maybe about the way we are using the index.

It's important to note that there is nothing specific about the  
buildout package here.

It is very important to make multiple versions available to support  
requirements for specific package versions.  It make builds/installs  
repeatable, whether talking about buildout or other systems built on  
setuptools.  When someone has tested and wants to release an  
application built from a collection of distributions, they will want  
to specify those *specific* versions for future builds or installs.   
This means that we need to retain any versions published indefinitely  
in a way that can be found by setuptools.

Currently, the only way to support multiple versions with the  
cheeseshop is to unhide past releases.  This has a fairly severe  
effect on performance.  As the example below shows, setuptools will  
fetch the package page and then fetch the pages for each release.   
That's a lot of requests.  What makes it worse is that the individual  
package pages can be fairly long.  I've gotten in the habit of  
including full documentation on every release page.  For example,  
recent release pages for zc.buildout are around 200K. This is a  
fairly significant amount of data to transfer.  This will certainly  
make the scanning process take a long time for clients. (Obviously,  
if we keep doing things the way we are, I'll need to stop doing that.)

All of this aggravates any performance problems we might have.

Up to now, setuptools has tried hard to use existing systems without  
change. This means that it reuses systems designed primarily for  
people, not software. I think that setuptools rightly took the  
approach it has up to now so that progress could be made without  
making people change other systems.  This was appropriate when  
setuptools was evolving and people were figuring out ways to use it.   
I think it is time to take a step back and think a lot harder about  
how we'd want to structure an index to support setuptools.

IMO, a setuptools-aware index would have a single page for each package:

- The single page would be published in a case-insensitive way. It  
would be nice to find a way to avoid this, or maybe we should use a  
windows-based web server. :)  It would also be served very cheaply,  
for example statically.

- The single page would list links for all available distributions,  
which should include all distributions published.  It would also list  
any other URLs that should be scanned for releases, when releases  
aren't all uploaded to PyPI.

- The single page would contain very little additional information.  
It would be for use by software, not humans.

In addition, the root page with a trailing / would be empty and very  
cheap.

There are a lot of ways we could achieve this pretty cheaply while  
keeping the existing system pretty much as it is.

For example, the current effort to bake static pages could bake these  
pages instead.  We could make the new index available at a different  
URL for people to play with while we worked the kinks out of the  
process.

Of course, those of us who use the cheesehop and setuptools  
extensively can also achieve much of this by changing the way we work.

Thoughts?

Jim

On Jul 10, 2007, at 8:44 AM, Philipp von Weitershausen wrote:

> When easy_installing zc.buildout I realized that the CheeseShop  
> still lists a gazillion old versions of zc.buildout. That makes it  
> take quite some time to install zc.buildout (see below), and I  
> reckon the same sort of check has to happen each time it looks for  
> a new version of that egg...
>
> Is there any reason for having so many old versions around?
>
>
> $ easy_install zc.buildout
> Searching for zc.buildout
> Reading http://cheeseshop.python.org/pypi/zc.buildout/
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b19
> Reading http://svn.zope.org/zc.buildout
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b22
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b23
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b20
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b21
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b26
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b27
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b24
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b25
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b28
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b17
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b16
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b18
> Best match: zc.buildout 1.0.0b28
> ...

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org





More information about the Catalog-SIG mailing list