Time for a static cheeseshop mirror for easy_install?
data:image/s3,"s3://crabby-images/bb604/bb60413610b3b0bf9a79992058a390d70f9f4584" alt=""
For what easy_install does, there really isn't any dynamic API usage, so a static mirror for easy_install could take a good bit of load off the cheeseshop. I don't know whether this will actually solve any problems the cheeseshop itself is having; it may be that ill-behaved web spiders are at fault, or something else altogether. However, since the downtime mostly creates issues for people using easy_install, creating a solution for those people certainly seems worthwhile. Since easy_install was designed to be able to use simple directory indexes and HTML pages as a package index, it should be possible to create a simple directory tree of HTML pages, using PyPI's public XML-RPC API. The mirror could use PyPI's RSS feed to know when a package's information is out of date, although I'm not sure that the RSS includes all modifications, such as when packages are deleted, releases are hidden, files uploaded, etc. However, assuming that there's a scalable way to receive change notifications, it should be straightforward to implement a mirror script for easy_install, and have it run on one or more volunteered hosts, perhaps with round-robin DNS (maybe easy-install.python.org?) I'll be happy to assist anyone who wants to work on this, including updates to easy_install itself, of course. I'd actually be hacking on this now, if the cheeseshop weren't down (i.e., I can't download any XML-RPC data to do prototyping at the moment!)
data:image/s3,"s3://crabby-images/efe4b/efe4bed0c2a0c378057d3a32de1b9bcc193bea5e" alt=""
Phillip J. Eby schrieb:
For what easy_install does, there really isn't any dynamic API usage, so a static mirror for easy_install could take a good bit of load off the cheeseshop.
I don't know whether this will actually solve any problems the cheeseshop itself is having; it may be that ill-behaved web spiders are at fault, or something else altogether. However, since the downtime mostly creates issues for people using easy_install, creating a solution for those people certainly seems worthwhile.
Since easy_install was designed to be able to use simple directory indexes and HTML pages as a package index, it should be possible to create a simple directory tree of HTML pages, using PyPI's public XML-RPC API. The mirror could use PyPI's RSS feed to know when a package's information is out of date, although I'm not sure that the RSS includes all modifications, such as when packages are deleted, releases are hidden, files uploaded, etc.
However, assuming that there's a scalable way to receive change notifications, it should be straightforward to implement a mirror script for easy_install, and have it run on one or more volunteered hosts, perhaps with round-robin DNS (maybe easy-install.python.org?)
The German Python community is willing to contribute a host for PyPI mirroring. Just contact me as soon as some solution is found. Georg
data:image/s3,"s3://crabby-images/9c9be/9c9be56cb178b72bd0ec3043e2da69a6d398b2c4" alt=""
On Sat, Apr 07, 2007 at 12:44:56PM -0400, Phillip J. Eby wrote:
I don't know whether this will actually solve any problems the cheeseshop itself is having; it may be that ill-behaved web spiders are at fault, or something else altogether.
In this recent case, two different spiders were crawling the wiki very quickly, the machine's load average was in the 70s, and the out-of-memory killer was killing off PostgreSQL processes. I don't think the load caused from people running easy_install is especially high -- it's certainly not a source of problems -- but making static pages would still be good to make mirroring the package archive more useful. Right now people could mirror http://cheeseshop.python.org/packages/, but there's nothing there for easy_install or for human readers; it's just a tree of package directories. --amk
data:image/s3,"s3://crabby-images/ea060/ea0603268c510fa2db7bcf72e9df233e5132a693" alt=""
On Apr 9, 2007, at 10:46 AM, A.M. Kuchling wrote: ...
Right now people could mirror http://cheeseshop.python.org/packages/, but there's nothing there for easy_install
Is there any reason why easy_install shouldn't look there? Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
data:image/s3,"s3://crabby-images/58a0b/58a0be886f0375938476d3eb7345a8b9d8cdc91e" alt=""
Jim Fulton schrieb:
On Apr 9, 2007, at 10:46 AM, A.M. Kuchling wrote: ...
Right now people could mirror http://cheeseshop.python.org/packages/, but there's nothing there for easy_install
Is there any reason why easy_install shouldn't look there?
It's only files, no descriptions. Also, some authors don't have their files on /packages, but at their own servers - i.e. they use the Cheeseshop just as a Python package index, not as a comprehensive package archive network. Regards, Martin
data:image/s3,"s3://crabby-images/ea060/ea0603268c510fa2db7bcf72e9df233e5132a693" alt=""
On Apr 9, 2007, at 11:32 AM, Martin v. Löwis wrote:
Jim Fulton schrieb:
On Apr 9, 2007, at 10:46 AM, A.M. Kuchling wrote: ...
Right now people could mirror http://cheeseshop.python.org/packages/, but there's nothing there for easy_install
Is there any reason why easy_install shouldn't look there?
It's only files, no descriptions.
Also, some authors don't have their files on /packages, but at their own servers - i.e. they use the Cheeseshop just as a Python package index, not as a comprehensive package archive network.
Sorry, I wasn't clear. I wasn't suggesting that easy_install should look here instead of where it already looks, but this seems like a good and cheap place to look especially when a specific version number is given. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
data:image/s3,"s3://crabby-images/bb604/bb60413610b3b0bf9a79992058a390d70f9f4584" alt=""
At 11:35 AM 4/9/2007 -0400, Jim Fulton wrote:
On Apr 9, 2007, at 11:32 AM, Martin v. Löwis wrote:
Jim Fulton schrieb:
On Apr 9, 2007, at 10:46 AM, A.M. Kuchling wrote: ...
Right now people could mirror http://cheeseshop.python.org/packages/, but there's nothing there for easy_install
Is there any reason why easy_install shouldn't look there?
It's only files, no descriptions.
Also, some authors don't have their files on /packages, but at their own servers - i.e. they use the Cheeseshop just as a Python package index, not as a comprehensive package archive network.
Sorry, I wasn't clear. I wasn't suggesting that easy_install should look here instead of where it already looks, but this seems like a good and cheap place to look especially when a specific version number is given.
Actually, version number doesn't help here; it's Python version, and source vs. binary that are the relevant distinctions. The download directories include files for all versions of a given project, so we would want to hit these three locations: http://cheeseshop.python.org/packages/*pyver*/p/projectname/ http://cheeseshop.python.org/packages/any/p/projectname/ http://cheeseshop.python.org/packages/source/p/projectname/ And we would then have all the downloadable-from-cheeseshop links. (I thought we could skip the "any", but some win32 exe's are classed as "any" Python version, and easy_install supports win32 exe's.) This is more web hits than is currently required to obtain similar information, but on the plus side they are efficient hits for the server. They also allow access to *all* versions of a package that are downloadable, whether they are "hidden" or not. On the minus side, there is no way to find externally-hosted files or SVN links, which means we would still have to always hit the dynamic page to know if we had found everything relevant.
data:image/s3,"s3://crabby-images/bb604/bb60413610b3b0bf9a79992058a390d70f9f4584" alt=""
At 10:46 AM 4/9/2007 -0400, A.M. Kuchling wrote:
On Sat, Apr 07, 2007 at 12:44:56PM -0400, Phillip J. Eby wrote:
I don't know whether this will actually solve any problems the cheeseshop itself is having; it may be that ill-behaved web spiders are at fault, or something else altogether.
In this recent case, two different spiders were crawling the wiki very quickly, the machine's load average was in the 70s, and the out-of-memory killer was killing off PostgreSQL processes.
I don't think the load caused from people running easy_install is especially high -- it's certainly not a source of problems -- but making static pages would still be good to make mirroring the package archive more useful. Right now people could mirror http://cheeseshop.python.org/packages/, but there's nothing there for easy_install or for human readers; it's just a tree of package directories.
Hm. Well, actually, if that directory structure were something I could code to, easy_install could sure as heck be *made* to use it. The only thing easy_install couldn't get from it was external links to downloads, SVN versions, etc. Notice, for example, that if you use "easy_install -f http://cheeseshop.python.org/packages/source/s/simple_json/ simple_json", easy_install won't look at the main package index, but just download directly. So an automated form of that calculation could easily be added to easy_install. What I had in mind for an easy_install mirror, however, was a script that would just create a /packagename/index.html file with links gathered from all versions of a package on the original Cheeseshop, and with packagename generated as a setuptools "safe name" in lower case. This pattern would allow easy_install to find every possible relevant link in just one (static) web hit.
data:image/s3,"s3://crabby-images/d6aa8/d6aa8008859d514696c3c9b11b90fe1eef9d3059" alt=""
On Sat, 2007-04-07 at 12:44 -0400, Phillip J. Eby wrote:
For what easy_install does, there really isn't any dynamic API usage, so a static mirror for easy_install could take a good bit of load off the cheeseshop.
Jon Rosebaugh (aka Chairos) actually started working on this a couple days ago during a Cheeseshop downtime. I mentioned this post to him, so he may join the list to follow up, but you can reach him on #python.web to check on this. -- Matt Good
participants (6)
-
"Martin v. Löwis"
-
A.M. Kuchling
-
Georg Brandl
-
Jim Fulton
-
Matt Good
-
Phillip J. Eby