Re: [Distutils] Why so many zc.buildout versions?
You raise a really good point, which is especially relevant in light of pypi performance issues and discussions. I'm copying the distutils and catalog sigs to get some wider discussion. I apologize for the cross posting. I'm beginning to wonder about the strategy that setuptools uses, or maybe about the way we are using the index. It's important to note that there is nothing specific about the buildout package here. It is very important to make multiple versions available to support requirements for specific package versions. It make builds/installs repeatable, whether talking about buildout or other systems built on setuptools. When someone has tested and wants to release an application built from a collection of distributions, they will want to specify those *specific* versions for future builds or installs. This means that we need to retain any versions published indefinitely in a way that can be found by setuptools. Currently, the only way to support multiple versions with the cheeseshop is to unhide past releases. This has a fairly severe effect on performance. As the example below shows, setuptools will fetch the package page and then fetch the pages for each release. That's a lot of requests. What makes it worse is that the individual package pages can be fairly long. I've gotten in the habit of including full documentation on every release page. For example, recent release pages for zc.buildout are around 200K. This is a fairly significant amount of data to transfer. This will certainly make the scanning process take a long time for clients. (Obviously, if we keep doing things the way we are, I'll need to stop doing that.) All of this aggravates any performance problems we might have. Up to now, setuptools has tried hard to use existing systems without change. This means that it reuses systems designed primarily for people, not software. I think that setuptools rightly took the approach it has up to now so that progress could be made without making people change other systems. This was appropriate when setuptools was evolving and people were figuring out ways to use it. I think it is time to take a step back and think a lot harder about how we'd want to structure an index to support setuptools. IMO, a setuptools-aware index would have a single page for each package: - The single page would be published in a case-insensitive way. It would be nice to find a way to avoid this, or maybe we should use a windows-based web server. :) It would also be served very cheaply, for example statically. - The single page would list links for all available distributions, which should include all distributions published. It would also list any other URLs that should be scanned for releases, when releases aren't all uploaded to PyPI. - The single page would contain very little additional information. It would be for use by software, not humans. In addition, the root page with a trailing / would be empty and very cheap. There are a lot of ways we could achieve this pretty cheaply while keeping the existing system pretty much as it is. For example, the current effort to bake static pages could bake these pages instead. We could make the new index available at a different URL for people to play with while we worked the kinks out of the process. Of course, those of us who use the cheesehop and setuptools extensively can also achieve much of this by changing the way we work. Thoughts? Jim On Jul 10, 2007, at 8:44 AM, Philipp von Weitershausen wrote:
When easy_installing zc.buildout I realized that the CheeseShop still lists a gazillion old versions of zc.buildout. That makes it take quite some time to install zc.buildout (see below), and I reckon the same sort of check has to happen each time it looks for a new version of that egg...
Is there any reason for having so many old versions around?
$ easy_install zc.buildout Searching for zc.buildout Reading http://cheeseshop.python.org/pypi/zc.buildout/ Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b19 Reading http://svn.zope.org/zc.buildout Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b22 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b23 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b20 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b21 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b26 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b27 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b24 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b25 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b28 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b17 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b16 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b18 Best match: zc.buildout 1.0.0b28 ...
-- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
At 10:32 AM 7/10/2007 -0400, Jim Fulton wrote:
Currently, the only way to support multiple versions with the cheeseshop is to unhide past releases. This has a fairly severe effect on performance. As the example below shows, setuptools will fetch the package page and then fetch the pages for each release. That's a lot of requests.
This could potentially be fixed in setuptools, so that it only looks at release pages that match its requirements, in highest-to-lowest version order, stopping as soon as a suitable match is found. That would eliminate the current issue -- but only for new versions of setuptools. So I do like your idea better, since it can be made to work for already-deployed clients as well.
I think it is time to take a step back and think a lot harder about how we'd want to structure an index to support setuptools.
+1, as long as somebody's willing to build and host the thing. Please see my earlier comments on the Catalog-Sig about this.
IMO, a setuptools-aware index would have a single page for each package:
- The single page would be published in a case-insensitive way. It would be nice to find a way to avoid this, or maybe we should use a windows-based web server. :) It would also be served very cheaply, for example statically.
Apache's CheckSpelling directive does case-insensitivity and approximate matching. Combine that with making the directories be based on "safe_name" values to begin with, and you should be all set.
- The single page would list links for all available distributions, which should include all distributions published. It would also list any other URLs that should be scanned for releases, when releases aren't all uploaded to PyPI.
The piece you're missing here is direct links to other downloads, such as "#egg=project-dev" subversion links. However, if you extracted these from all of the relevant PyPI HTML pages, you could certainly do that.
In addition, the root page with a trailing / would be empty and very cheap.
As long as the individual package directories are safe_name based, this would work.
There are a lot of ways we could achieve this pretty cheaply while keeping the existing system pretty much as it is.
Of course, there are still other reasons to want to improve the Cheeseshop's performance, such as search engines and other bots.
For example, the current effort to bake static pages could bake these pages instead. We could make the new index available at a different URL for people to play with while we worked the kinks out of the process.
...and then use a User-Agent rewrite rule to redirect setuptools clients to the static piece, as soon as we're satisfied that it works.
On Jul 10, 2007, at 11:56 AM, Phillip J. Eby wrote:
At 10:32 AM 7/10/2007 -0400, Jim Fulton wrote:
Currently, the only way to support multiple versions with the cheeseshop is to unhide past releases. This has a fairly severe effect on performance. As the example below shows, setuptools will fetch the package page and then fetch the pages for each release. That's a lot of requests.
This could potentially be fixed in setuptools, so that it only looks at release pages that match its requirements, in highest-to- lowest version order, stopping as soon as a suitable match is found. That would eliminate the current issue
No, it will mitigate the current issue somewhat, but it will still involve multiple requests per package, while a simpler index structure would allow a single request per package.
-- but only for new versions of setuptools. So I do like your idea better, since it can be made to work for already-deployed clients as well.
Yup. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
No, it will mitigate the current issue somewhat, but it will still involve multiple requests per package, while a simpler index structure would allow a single request per package.
I don't understand. If setuptools would always look /pypi/package/version first, it would immediately find the right page if that version is indeed stored in the cheeseshop. Why would that require multiple requests per package? Regards, Martin
On Jul 10, 2007, at 5:39 PM, Martin v. Löwis wrote:
No, it will mitigate the current issue somewhat, but it will still involve multiple requests per package, while a simpler index structure would allow a single request per package.
I don't understand. If setuptools would always look /pypi/package/version first, it would immediately find the right page if that version is indeed stored in the cheeseshop.
Why would that require multiple requests per package?
It usually doesn't have a single required version. It usually has just a package name or a name and a range of versions. It has to scan the package page to find out what versions are available, and *then* it can load the release page for the highest version that satisfies the requirement. It can usually read that one page, however, there may be additional filtering needed that would cause it to search multiple releases. For example, it might be looking for a source distribution, or a platform-specific distribution that isn't available for the most recent release. In any case, the best case is that it has to scan the package page to find the most recent release, and then scan that release page. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
For example, the current effort to bake static pages could bake these pages instead.
Certainly not instead; in addition, if there are volunteers to implement that.
We could make the new index available at a different URL for people to play with while we worked the kinks out of the process.
I have been thinking about the same thing. I think it would be good to have, however, it will surely take some time until all setuptools implementations learn to use it.
Of course, those of us who use the cheesehop and setuptools extensively can also achieve much of this by changing the way we work.
Hmm. How about those using them extensively start contributing to them also? Regards, Martin
On Jul 10, 2007, at 5:36 PM, Martin v. Löwis wrote:
For example, the current effort to bake static pages could bake these pages instead.
Certainly not instead; in addition, if there are volunteers to implement that.
Sure,
We could make the new index available at a different URL for people to play with while we worked the kinks out of the process.
I have been thinking about the same thing. I think it would be good to have, however, it will surely take some time until all setuptools implementations learn to use it.
No, not at all. You can tell setuptools to use a different index than the current one. For example, this is a command-line option for easy_install and a configuration option for buildout.
Of course, those of us who use the cheesehop and setuptools extensively can also achieve much of this by changing the way we work.
Hmm. How about those using them extensively start contributing to them also?
I like to think that I am by participating in this discussion. Actually changing the cheeseshop software has a very high learning curve. I don't think that I can make that kind of time any time soon. I'm very grateful that you and René are doing what you're doing. I also suspect that, given your and René's activity, it would be counter productive for someone else to get involved at that level, but maybe I'm wrong about that. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
I have been thinking about the same thing. I think it would be good to have, however, it will surely take some time until all setuptools implementations learn to use it.
No, not at all. You can tell setuptools to use a different index than the current one. For example, this is a command-line option for easy_install and a configuration option for buildout.
Yes. However, that will make the feature only available to those who know about it. I have very shallow knowledge of setuptools and easy_install only (I nearly never use them at all), and I surely would miss such an option, and miss why it's relevant. It's true that the Apache installation could also redirect existing installations to the new pages, but I doubt that they would be otherwise widely used until setuptools changes its hard-coded default.
Hmm. How about those using them extensively start contributing to them also?
I like to think that I am by participating in this discussion. Actually changing the cheeseshop software has a very high learning curve. I don't think that I can make that kind of time any time soon. I'm very grateful that you and René are doing what you're doing. I also suspect that, given your and René's activity, it would be counter productive for someone else to get involved at that level, but maybe I'm wrong about that.
I strongly think you are. There are many things that could be improved, and I would not mind leaving the cheeseshop alone if some other maintainer came along - I also have other things to do. Regards, Martin
I have to say the cheeseshop code was pretty easy to get into. I think I was able to make most of my changes within the first reading of it. It quite clearly separates things like the templates, the database functionality and the 'webui'. There definitely are a huge amount of things that I would love to change with it over time, and I hope other people begin to develop it more - it can only help the python community as a whole. The amount of people doing releases has increased quite a lot even in the last two months, so I think the releases will get more frequent. As it grows it will continue to need different changes - optimizations to the database/webserver, and also optimizations to the user interface. On 7/11/07, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I have been thinking about the same thing. I think it would be good to have, however, it will surely take some time until all setuptools implementations learn to use it.
No, not at all. You can tell setuptools to use a different index than the current one. For example, this is a command-line option for easy_install and a configuration option for buildout.
Yes. However, that will make the feature only available to those who know about it. I have very shallow knowledge of setuptools and easy_install only (I nearly never use them at all), and I surely would miss such an option, and miss why it's relevant.
It's true that the Apache installation could also redirect existing installations to the new pages, but I doubt that they would be otherwise widely used until setuptools changes its hard-coded default.
Hmm. How about those using them extensively start contributing to them also?
I like to think that I am by participating in this discussion. Actually changing the cheeseshop software has a very high learning curve. I don't think that I can make that kind of time any time soon. I'm very grateful that you and René are doing what you're doing. I also suspect that, given your and René's activity, it would be counter productive for someone else to get involved at that level, but maybe I'm wrong about that.
I strongly think you are. There are many things that could be improved, and I would not mind leaving the cheeseshop alone if some other maintainer came along - I also have other things to do.
Regards, Martin
_______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
On Jul 11, 2007, at 1:06 AM, Martin v. Löwis wrote:
I have been thinking about the same thing. I think it would be good to have, however, it will surely take some time until all setuptools implementations learn to use it.
No, not at all. You can tell setuptools to use a different index than the current one. For example, this is a command-line option for easy_install and a configuration option for buildout.
Yes. However, that will make the feature only available to those who know about it. I have very shallow knowledge of setuptools and easy_install only (I nearly never use them at all), and I surely would miss such an option, and miss why it's relevant.
That's fine. I don't care if most people can find it. While it is an *experimental* index, it is fine if only a few people play with it. If it is proven to work properly, then we could arrange that other people get it by default.
It's true that the Apache installation could also redirect existing installations to the new pages, but I doubt that they would be otherwise widely used until setuptools changes its hard-coded default.
Right, that's why, if the experiment works, we should then change the Apache config to rediect setuptools to it. Changing the apache config is much easier than updating the setuptools installed base. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
participants (4)
-
"Martin v. Löwis" -
Jim Fulton -
Phillip J. Eby -
René Dudfield