[Catalog-sig] Prototype setuptools-specific PyPI index.

Jodok Batlogg jodok at lovelysystems.com
Fri Jul 20 10:50:40 CEST 2007


thanks jim.

you save our day. we'll send some austrian cheese over :)

jodok

On 19.07.2007, at 13:06, Jim Fulton wrote:

> Over the past few months, we've struggled quite a bit with Python
> Package Index (PyPI) performance and stability.  Thanks to the heroic
> efforts of Martin v. Löwis and others, performance and especially
> stability have improved quite a bit. Martin has demonstrated that, at
> least when running well, PyPI seems to answer most requests on the
> order of 7 miliseconds (around 150 requests per second) internally.
> That's not bad.  Unfortunately for users, actual times can be quite a
> bit longer.  For me at work, request take around 300 milliseconds.
> For Martin, they seem to take somewhat longer.  300 milliseconds
> isn't so bad for a request or two, however, easy install can easily
> make 10s or even hundreds of requests to satisfy a user request for a
> package.  zc.buildout, when verifying that a large system with many
> tens of packages has the most up to date versions of each package can
> easily make thousands of requests.
>
> Why do setuptools and buildout make so many requests?  If a package
> exposes more than one release, then setuptools checks the package's
> main PyPI page and the pages for each release.  We need to be able to
> easily use older releases, so we can't hide old releases.  Typical
> projects of ours have many old releases exposed.  If setuptools was
> more clever in the way it searched PyPI, but it would still have to
> make a minimum of 2 requests per package for packages with multiple
> versions exposed.
>
> Another potential issue is that PyPI pages can be large.  I've found
> it convenient to use PyPI package pages as the home page for many of
> my projects.  I like to include package documentation in my project
> pages.  Perhaps this is an abuse of PyPI, but it is very convenient
> for me and no one has complained. :)  The zc.buildout pages are
> around 200K.  That's a fair bit of data for setuptools to download
> and scan for download URLs.
>
> In the course of this discussion, I've realized that it doesn't make
> sense for setuptools to use the same interface that humans use.
> setuptools doesn't need to see all of the data that is useful to
> humans. Similarly, humans generally don't need to see all of the
> historical releases for a project.  I suggested a simple page format
> designed just for setuptools.  An alternative would be an xmlrpc
> API.  I prefer pages because I think that, over time, the amount of
> requests from automated tools like easy_install and zc.buildout will
> increase substantially and ultimately, will overwhelm dynamic
> servers, even ones like PyPI that are reasonably fast.  I also think
> that a simple static collection of pages will be easier to mirror and
> I think some number of geographic mirrors is likely to help some
> people.  I promised to prototype the format I suggested.
>
> I've created and experimental prototype setuptools-specific package
> index at
>
>    http://download.zope.org/ppix
>
> Going to that page gives brief instructions for using it with
> easy_install and zc.buildout.  To see an individual package page, add
> the package name to the URL, as in:
>
>    http://download.zope.org/ppix/setuptools/
>
> A few things to note about this:
>
> - I don't expose a long package list at http://download.zope.org/
> ppix/.  The long package list would be expensive to download and
> supports a use case that I consider to be of negative value, which is
> installing packages with case-insensitive package names,  I think it
> is important for humans to be able to search for packages using case-
> insensitive search terms, but I think that, after identifying a
> package, precise package names should be used.  I think it is
> especially important that precise package names be used in package
> requirements.
>
> - There is a single page per package.  This can greatly reduce the
> number of requests.  Packages that store all of their distributions
> in PyPI and that don't have off-site home pages or download URLs can
> be scanned with a single request.  Note that I excluded home page and
> download URLs that pointed back to the packages PyPI page, as that
> wouldn't provide any new information to setuptools.
>
> - Download URLs for *hidden* packages are included.  Humans don't
> need to see old revisions, but setuptools-based tools do.  If we used
> an index like this for setuptools, we could stop unhiding old
> releases when we created new releases in PyPI.  This would make PyPI
> more useful to humans and less of a pain for developers.
>
> - Download URLs are the same as they are in PyPI.  Using this new
> index, distributions are still downloaded from PyPI, so the index
> doesn't affect PyPI download statistics.
>
> To see the impact of this, it's interesting to look at installing
> zc.buildout using easy_install from PyPI and from the experimental
> index:
> Installing using PyPI looks like this:
>
>    (env)jim at ds9:~/tmp$ time easy_install zc.buildout
>    Searching for zc.buildout
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b19
>    Reading http://svn.zope.org/zc.buildout
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b22
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b23
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b20
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b21
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b26
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b27
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b24
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b25
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b28
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b17
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b16
>    Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b18
>    Best match: zc.buildout 1.0.0b28
>    Downloading http://cheeseshop.python.org/packages/2.5/z/
> zc.buildout/zc.buildout-1.0.0b28-
> py2.5.egg#md5=4e37e53f010ed7984555a029732f479d
>    Processing zc.buildout-1.0.0b28-py2.5.egg
>    creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
>    Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/lib/
> python2.5
>    Adding zc.buildout 1.0.0b28 to easy-install.pth file
>    Installing buildout script to /home/jim/tmp/env/bin/
>
>    Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
>    Processing dependencies for zc.buildout
>    Searching for setuptools==0.6c6
>    Best match: setuptools 0.6c6
>    Processing setuptools-0.6c6-py2.5.egg
>    Adding setuptools 0.6c6 to easy-install.pth file
>    Installing easy_install script to /home/jim/tmp/env/bin/
>    Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
>
>    Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6- 
> py2.5.egg
>    Processing dependencies for setuptools==0.6c6
>    Finished processing dependencies for setuptools==0.6c6
>    Finished installing setuptools==0.6c6
>    Finished processing dependencies for zc.buildout
>    Finished installing zc.buildout
>
>    real	0m31.360s
>    user	0m1.136s
>    sys	0m0.060s
>
> Note the large number of pages read.  Here I was installing a single
> package with one dependency, setuptools, that was already installed.
> Let's look at this again using the experimental index:
>
>    (env)jim at ds9:~/tmp$ time easy_install -i http://download.zope.org/
> ppix zc.buildout
>    Searching for zc.buildout
>    Reading http://download.zope.org/ppix/zc.buildout/
>    Best match: zc.buildout 1.0.0b28
>    Downloading http://cheeseshop.python.org/packages/2.5/z/
> zc.buildout/zc.buildout-1.0.0b28-
> py2.5.egg#md5=4e37e53f010ed7984555a029732f479d
>    Processing zc.buildout-1.0.0b28-py2.5.egg
>    creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
>    Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/lib/
> python2.5
>    Adding zc.buildout 1.0.0b28 to easy-install.pth file
>    Installing buildout script to /home/jim/tmp/env/bin/
>
>    Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
>    Processing dependencies for zc.buildout
>    Searching for setuptools==0.6c6
>    Best match: setuptools 0.6c6
>    Processing setuptools-0.6c6-py2.5.egg
>    Adding setuptools 0.6c6 to easy-install.pth file
>    Installing easy_install script to /home/jim/tmp/env/bin/
>    Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
>
>    Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6- 
> py2.5.egg
>    Processing dependencies for setuptools==0.6c6
>    Finished processing dependencies for setuptools==0.6c6
>    Finished installing setuptools==0.6c6
>    Finished processing dependencies for zc.buildout
>    Finished installing zc.buildout
>
>    real	0m7.006s
>    user	0m0.244s
>    sys	0m0.040s
>
> Note:
>
> - We made far fewer requests with the new index
>
> - Most of the time in the second example was spent actually
> downloading the buildout distribution.  Most of the time in the first
> example was spent reading the index.
>
> - I used workingenv to create clean environments for each of the
> examples above.
>
> WRT zc.buildout, refreshing a buildout with just ZODB installed in it
> takes about 45 seconds for me using PyPI and about 5 seconds using
> the experimental index.
>
> Some of the speed improvements is due to the fact that the
> experimental index is much closer to me (on the net) than PyPI.  ATM,
> requests to PyPI take *me* around 500 milliseconds, while requests to
> the experimental index are taking between 100 and 300 milliseconds.
> (I'm at home and this seems to be somewhat variable.)  Most of the
> speed improvements are from reducing the number of requests.
>
> I'm polling PyPI once a minute to get and apply updates. Thanks to
> the new XML-RPC method that Martin added, this is very efficient to  
> do.
>
> I encourage people to check this out and even try using it with
> easy_install and especially buildout. AFAIK, aside from being much
> faster and showing download files for hidden releases it is
> completely equivalent to PyPI for setuptools use.  My intension is to
> keep this experimental index going and up to date for the foreseeable
> future and plan to use it for all my work.
>
> My primary goal is to prototype the new index format.  If this seems
> useful, then I think that www.python.org should expose an index in
> this format to setuptools, either at a different URL or by satisfying
> setuptools requests from the index based on client information.  I'd
> love to see this index populated via a baking mechanism that updates
> package pages when they change, rather than through polling as I'm
> doing.
>
> There would be some benefit to having geographic mirrors.  I suspect
> that having such mirrors available would improve performance further,
> at least for some folks.  It might also be useful to have some
> mirrors for redundancy purposes.  Note though that what I'm doing is
> mirroring the only index data. I'm not mirroring distributions.  Of
> course, I'd be happy to make my software available. (It already is
> via our subversion repository.)
>
> I hope this effort spurs useful discussion and progress.
>
> Jim
>
> --
> Jim Fulton			mailto:jim at zope.com		Python Powered!
> CTO 				(540) 361-1714			http://www.python.org
> Zope Corporation	http://www.zope.com		http://www.zope.org
>
>
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

--
"Although never is often better than *right* now."
   -- The Zen of Python, by Tim Peters

Jodok Batlogg, Lovely Systems
Schmelzhütterstraße 26a, 6850 Dornbirn, Austria
phone: +43 5572 908060, fax: +43 5572 908060-77


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2454 bytes
Desc: not available
Url : http://mail.python.org/pipermail/catalog-sig/attachments/20070720/26d28f4b/attachment-0001.bin 


More information about the Catalog-SIG mailing list