[Distutils] [Catalog-sig] Prototype setuptools-specific PyPI index.
Jodok Batlogg
jodok at lovelysystems.com
Fri Jul 20 10:50:40 CEST 2007
thanks jim.
you save our day. we'll send some austrian cheese over :)
jodok
On 19.07.2007, at 13:06, Jim Fulton wrote:
> Over the past few months, we've struggled quite a bit with Python
> Package Index (PyPI) performance and stability. Thanks to the heroic
> efforts of Martin v. Löwis and others, performance and especially
> stability have improved quite a bit. Martin has demonstrated that, at
> least when running well, PyPI seems to answer most requests on the
> order of 7 miliseconds (around 150 requests per second) internally.
> That's not bad. Unfortunately for users, actual times can be quite a
> bit longer. For me at work, request take around 300 milliseconds.
> For Martin, they seem to take somewhat longer. 300 milliseconds
> isn't so bad for a request or two, however, easy install can easily
> make 10s or even hundreds of requests to satisfy a user request for a
> package. zc.buildout, when verifying that a large system with many
> tens of packages has the most up to date versions of each package can
> easily make thousands of requests.
>
> Why do setuptools and buildout make so many requests? If a package
> exposes more than one release, then setuptools checks the package's
> main PyPI page and the pages for each release. We need to be able to
> easily use older releases, so we can't hide old releases. Typical
> projects of ours have many old releases exposed. If setuptools was
> more clever in the way it searched PyPI, but it would still have to
> make a minimum of 2 requests per package for packages with multiple
> versions exposed.
>
> Another potential issue is that PyPI pages can be large. I've found
> it convenient to use PyPI package pages as the home page for many of
> my projects. I like to include package documentation in my project
> pages. Perhaps this is an abuse of PyPI, but it is very convenient
> for me and no one has complained. :) The zc.buildout pages are
> around 200K. That's a fair bit of data for setuptools to download
> and scan for download URLs.
>
> In the course of this discussion, I've realized that it doesn't make
> sense for setuptools to use the same interface that humans use.
> setuptools doesn't need to see all of the data that is useful to
> humans. Similarly, humans generally don't need to see all of the
> historical releases for a project. I suggested a simple page format
> designed just for setuptools. An alternative would be an xmlrpc
> API. I prefer pages because I think that, over time, the amount of
> requests from automated tools like easy_install and zc.buildout will
> increase substantially and ultimately, will overwhelm dynamic
> servers, even ones like PyPI that are reasonably fast. I also think
> that a simple static collection of pages will be easier to mirror and
> I think some number of geographic mirrors is likely to help some
> people. I promised to prototype the format I suggested.
>
> I've created and experimental prototype setuptools-specific package
> index at
>
> http://download.zope.org/ppix
>
> Going to that page gives brief instructions for using it with
> easy_install and zc.buildout. To see an individual package page, add
> the package name to the URL, as in:
>
> http://download.zope.org/ppix/setuptools/
>
> A few things to note about this:
>
> - I don't expose a long package list at http://download.zope.org/
> ppix/. The long package list would be expensive to download and
> supports a use case that I consider to be of negative value, which is
> installing packages with case-insensitive package names, I think it
> is important for humans to be able to search for packages using case-
> insensitive search terms, but I think that, after identifying a
> package, precise package names should be used. I think it is
> especially important that precise package names be used in package
> requirements.
>
> - There is a single page per package. This can greatly reduce the
> number of requests. Packages that store all of their distributions
> in PyPI and that don't have off-site home pages or download URLs can
> be scanned with a single request. Note that I excluded home page and
> download URLs that pointed back to the packages PyPI page, as that
> wouldn't provide any new information to setuptools.
>
> - Download URLs for *hidden* packages are included. Humans don't
> need to see old revisions, but setuptools-based tools do. If we used
> an index like this for setuptools, we could stop unhiding old
> releases when we created new releases in PyPI. This would make PyPI
> more useful to humans and less of a pain for developers.
>
> - Download URLs are the same as they are in PyPI. Using this new
> index, distributions are still downloaded from PyPI, so the index
> doesn't affect PyPI download statistics.
>
> To see the impact of this, it's interesting to look at installing
> zc.buildout using easy_install from PyPI and from the experimental
> index:
> Installing using PyPI looks like this:
>
> (env)jim at ds9:~/tmp$ time easy_install zc.buildout
> Searching for zc.buildout
> Reading http://cheeseshop.python.org/pypi/zc.buildout/
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b19
> Reading http://svn.zope.org/zc.buildout
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b22
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b23
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b20
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b21
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b26
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b27
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b24
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b25
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b28
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b17
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b16
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b18
> Best match: zc.buildout 1.0.0b28
> Downloading http://cheeseshop.python.org/packages/2.5/z/
> zc.buildout/zc.buildout-1.0.0b28-
> py2.5.egg#md5=4e37e53f010ed7984555a029732f479d
> Processing zc.buildout-1.0.0b28-py2.5.egg
> creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
> Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/lib/
> python2.5
> Adding zc.buildout 1.0.0b28 to easy-install.pth file
> Installing buildout script to /home/jim/tmp/env/bin/
>
> Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
> Processing dependencies for zc.buildout
> Searching for setuptools==0.6c6
> Best match: setuptools 0.6c6
> Processing setuptools-0.6c6-py2.5.egg
> Adding setuptools 0.6c6 to easy-install.pth file
> Installing easy_install script to /home/jim/tmp/env/bin/
> Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
>
> Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6-
> py2.5.egg
> Processing dependencies for setuptools==0.6c6
> Finished processing dependencies for setuptools==0.6c6
> Finished installing setuptools==0.6c6
> Finished processing dependencies for zc.buildout
> Finished installing zc.buildout
>
> real 0m31.360s
> user 0m1.136s
> sys 0m0.060s
>
> Note the large number of pages read. Here I was installing a single
> package with one dependency, setuptools, that was already installed.
> Let's look at this again using the experimental index:
>
> (env)jim at ds9:~/tmp$ time easy_install -i http://download.zope.org/
> ppix zc.buildout
> Searching for zc.buildout
> Reading http://download.zope.org/ppix/zc.buildout/
> Best match: zc.buildout 1.0.0b28
> Downloading http://cheeseshop.python.org/packages/2.5/z/
> zc.buildout/zc.buildout-1.0.0b28-
> py2.5.egg#md5=4e37e53f010ed7984555a029732f479d
> Processing zc.buildout-1.0.0b28-py2.5.egg
> creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
> Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/lib/
> python2.5
> Adding zc.buildout 1.0.0b28 to easy-install.pth file
> Installing buildout script to /home/jim/tmp/env/bin/
>
> Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
> Processing dependencies for zc.buildout
> Searching for setuptools==0.6c6
> Best match: setuptools 0.6c6
> Processing setuptools-0.6c6-py2.5.egg
> Adding setuptools 0.6c6 to easy-install.pth file
> Installing easy_install script to /home/jim/tmp/env/bin/
> Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
>
> Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6-
> py2.5.egg
> Processing dependencies for setuptools==0.6c6
> Finished processing dependencies for setuptools==0.6c6
> Finished installing setuptools==0.6c6
> Finished processing dependencies for zc.buildout
> Finished installing zc.buildout
>
> real 0m7.006s
> user 0m0.244s
> sys 0m0.040s
>
> Note:
>
> - We made far fewer requests with the new index
>
> - Most of the time in the second example was spent actually
> downloading the buildout distribution. Most of the time in the first
> example was spent reading the index.
>
> - I used workingenv to create clean environments for each of the
> examples above.
>
> WRT zc.buildout, refreshing a buildout with just ZODB installed in it
> takes about 45 seconds for me using PyPI and about 5 seconds using
> the experimental index.
>
> Some of the speed improvements is due to the fact that the
> experimental index is much closer to me (on the net) than PyPI. ATM,
> requests to PyPI take *me* around 500 milliseconds, while requests to
> the experimental index are taking between 100 and 300 milliseconds.
> (I'm at home and this seems to be somewhat variable.) Most of the
> speed improvements are from reducing the number of requests.
>
> I'm polling PyPI once a minute to get and apply updates. Thanks to
> the new XML-RPC method that Martin added, this is very efficient to
> do.
>
> I encourage people to check this out and even try using it with
> easy_install and especially buildout. AFAIK, aside from being much
> faster and showing download files for hidden releases it is
> completely equivalent to PyPI for setuptools use. My intension is to
> keep this experimental index going and up to date for the foreseeable
> future and plan to use it for all my work.
>
> My primary goal is to prototype the new index format. If this seems
> useful, then I think that www.python.org should expose an index in
> this format to setuptools, either at a different URL or by satisfying
> setuptools requests from the index based on client information. I'd
> love to see this index populated via a baking mechanism that updates
> package pages when they change, rather than through polling as I'm
> doing.
>
> There would be some benefit to having geographic mirrors. I suspect
> that having such mirrors available would improve performance further,
> at least for some folks. It might also be useful to have some
> mirrors for redundancy purposes. Note though that what I'm doing is
> mirroring the only index data. I'm not mirroring distributions. Of
> course, I'd be happy to make my software available. (It already is
> via our subversion repository.)
>
> I hope this effort spurs useful discussion and progress.
>
> Jim
>
> --
> Jim Fulton mailto:jim at zope.com Python Powered!
> CTO (540) 361-1714 http://www.python.org
> Zope Corporation http://www.zope.com http://www.zope.org
>
>
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
--
"Although never is often better than *right* now."
-- The Zen of Python, by Tim Peters
Jodok Batlogg, Lovely Systems
Schmelzhütterstraße 26a, 6850 Dornbirn, Austria
phone: +43 5572 908060, fax: +43 5572 908060-77
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2454 bytes
Desc: not available
Url : http://mail.python.org/pipermail/distutils-sig/attachments/20070720/26d28f4b/attachment.bin
More information about the Distutils-SIG
mailing list