Prototype setuptools-specific PyPI index.
Over the past few months, we've struggled quite a bit with Python Package Index (PyPI) performance and stability. Thanks to the heroic efforts of Martin v. Löwis and others, performance and especially stability have improved quite a bit. Martin has demonstrated that, at least when running well, PyPI seems to answer most requests on the order of 7 miliseconds (around 150 requests per second) internally. That's not bad. Unfortunately for users, actual times can be quite a bit longer. For me at work, request take around 300 milliseconds. For Martin, they seem to take somewhat longer. 300 milliseconds isn't so bad for a request or two, however, easy install can easily make 10s or even hundreds of requests to satisfy a user request for a package. zc.buildout, when verifying that a large system with many tens of packages has the most up to date versions of each package can easily make thousands of requests. Why do setuptools and buildout make so many requests? If a package exposes more than one release, then setuptools checks the package's main PyPI page and the pages for each release. We need to be able to easily use older releases, so we can't hide old releases. Typical projects of ours have many old releases exposed. If setuptools was more clever in the way it searched PyPI, but it would still have to make a minimum of 2 requests per package for packages with multiple versions exposed. Another potential issue is that PyPI pages can be large. I've found it convenient to use PyPI package pages as the home page for many of my projects. I like to include package documentation in my project pages. Perhaps this is an abuse of PyPI, but it is very convenient for me and no one has complained. :) The zc.buildout pages are around 200K. That's a fair bit of data for setuptools to download and scan for download URLs. In the course of this discussion, I've realized that it doesn't make sense for setuptools to use the same interface that humans use. setuptools doesn't need to see all of the data that is useful to humans. Similarly, humans generally don't need to see all of the historical releases for a project. I suggested a simple page format designed just for setuptools. An alternative would be an xmlrpc API. I prefer pages because I think that, over time, the amount of requests from automated tools like easy_install and zc.buildout will increase substantially and ultimately, will overwhelm dynamic servers, even ones like PyPI that are reasonably fast. I also think that a simple static collection of pages will be easier to mirror and I think some number of geographic mirrors is likely to help some people. I promised to prototype the format I suggested. I've created and experimental prototype setuptools-specific package index at http://download.zope.org/ppix Going to that page gives brief instructions for using it with easy_install and zc.buildout. To see an individual package page, add the package name to the URL, as in: http://download.zope.org/ppix/setuptools/ A few things to note about this: - I don't expose a long package list at http://download.zope.org/ ppix/. The long package list would be expensive to download and supports a use case that I consider to be of negative value, which is installing packages with case-insensitive package names, I think it is important for humans to be able to search for packages using case- insensitive search terms, but I think that, after identifying a package, precise package names should be used. I think it is especially important that precise package names be used in package requirements. - There is a single page per package. This can greatly reduce the number of requests. Packages that store all of their distributions in PyPI and that don't have off-site home pages or download URLs can be scanned with a single request. Note that I excluded home page and download URLs that pointed back to the packages PyPI page, as that wouldn't provide any new information to setuptools. - Download URLs for *hidden* packages are included. Humans don't need to see old revisions, but setuptools-based tools do. If we used an index like this for setuptools, we could stop unhiding old releases when we created new releases in PyPI. This would make PyPI more useful to humans and less of a pain for developers. - Download URLs are the same as they are in PyPI. Using this new index, distributions are still downloaded from PyPI, so the index doesn't affect PyPI download statistics. To see the impact of this, it's interesting to look at installing zc.buildout using easy_install from PyPI and from the experimental index: Installing using PyPI looks like this: (env)jim@ds9:~/tmp$ time easy_install zc.buildout Searching for zc.buildout Reading http://cheeseshop.python.org/pypi/zc.buildout/ Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b19 Reading http://svn.zope.org/zc.buildout Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b22 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b23 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b20 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b21 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b26 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b27 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b24 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b25 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b28 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b17 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b16 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b18 Best match: zc.buildout 1.0.0b28 Downloading http://cheeseshop.python.org/packages/2.5/z/ zc.buildout/zc.buildout-1.0.0b28- py2.5.egg#md5=4e37e53f010ed7984555a029732f479d Processing zc.buildout-1.0.0b28-py2.5.egg creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- py2.5.egg Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/lib/ python2.5 Adding zc.buildout 1.0.0b28 to easy-install.pth file Installing buildout script to /home/jim/tmp/env/bin/ Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- py2.5.egg Processing dependencies for zc.buildout Searching for setuptools==0.6c6 Best match: setuptools 0.6c6 Processing setuptools-0.6c6-py2.5.egg Adding setuptools 0.6c6 to easy-install.pth file Installing easy_install script to /home/jim/tmp/env/bin/ Installing easy_install-2.5 script to /home/jim/tmp/env/bin/ Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6-py2.5.egg Processing dependencies for setuptools==0.6c6 Finished processing dependencies for setuptools==0.6c6 Finished installing setuptools==0.6c6 Finished processing dependencies for zc.buildout Finished installing zc.buildout real 0m31.360s user 0m1.136s sys 0m0.060s Note the large number of pages read. Here I was installing a single package with one dependency, setuptools, that was already installed. Let's look at this again using the experimental index: (env)jim@ds9:~/tmp$ time easy_install -i http://download.zope.org/ ppix zc.buildout Searching for zc.buildout Reading http://download.zope.org/ppix/zc.buildout/ Best match: zc.buildout 1.0.0b28 Downloading http://cheeseshop.python.org/packages/2.5/z/ zc.buildout/zc.buildout-1.0.0b28- py2.5.egg#md5=4e37e53f010ed7984555a029732f479d Processing zc.buildout-1.0.0b28-py2.5.egg creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- py2.5.egg Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/lib/ python2.5 Adding zc.buildout 1.0.0b28 to easy-install.pth file Installing buildout script to /home/jim/tmp/env/bin/ Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- py2.5.egg Processing dependencies for zc.buildout Searching for setuptools==0.6c6 Best match: setuptools 0.6c6 Processing setuptools-0.6c6-py2.5.egg Adding setuptools 0.6c6 to easy-install.pth file Installing easy_install script to /home/jim/tmp/env/bin/ Installing easy_install-2.5 script to /home/jim/tmp/env/bin/ Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6-py2.5.egg Processing dependencies for setuptools==0.6c6 Finished processing dependencies for setuptools==0.6c6 Finished installing setuptools==0.6c6 Finished processing dependencies for zc.buildout Finished installing zc.buildout real 0m7.006s user 0m0.244s sys 0m0.040s Note: - We made far fewer requests with the new index - Most of the time in the second example was spent actually downloading the buildout distribution. Most of the time in the first example was spent reading the index. - I used workingenv to create clean environments for each of the examples above. WRT zc.buildout, refreshing a buildout with just ZODB installed in it takes about 45 seconds for me using PyPI and about 5 seconds using the experimental index. Some of the speed improvements is due to the fact that the experimental index is much closer to me (on the net) than PyPI. ATM, requests to PyPI take *me* around 500 milliseconds, while requests to the experimental index are taking between 100 and 300 milliseconds. (I'm at home and this seems to be somewhat variable.) Most of the speed improvements are from reducing the number of requests. I'm polling PyPI once a minute to get and apply updates. Thanks to the new XML-RPC method that Martin added, this is very efficient to do. I encourage people to check this out and even try using it with easy_install and especially buildout. AFAIK, aside from being much faster and showing download files for hidden releases it is completely equivalent to PyPI for setuptools use. My intension is to keep this experimental index going and up to date for the foreseeable future and plan to use it for all my work. My primary goal is to prototype the new index format. If this seems useful, then I think that www.python.org should expose an index in this format to setuptools, either at a different URL or by satisfying setuptools requests from the index based on client information. I'd love to see this index populated via a baking mechanism that updates package pages when they change, rather than through polling as I'm doing. There would be some benefit to having geographic mirrors. I suspect that having such mirrors available would improve performance further, at least for some folks. It might also be useful to have some mirrors for redundancy purposes. Note though that what I'm doing is mirroring the only index data. I'm not mirroring distributions. Of course, I'd be happy to make my software available. (It already is via our subversion repository.) I hope this effort spurs useful discussion and progress. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
I've created and experimental prototype setuptools-specific package index at
Cool! If this proves useful, people are encouraged to contribute the proper patches to PyPI to regenerate the page directly on each log change. There is a slight transactional trickiness to doing so: If you regenerate before the commit, it might be that the commit fails; then you would have to rollback the page update, too. If you regenerate after commit, it might be that you run into race conditions if the same package sees two updates in two transactions very quickly, and the second regeneration completes before the first one. If people would find it easier to make these pages dynamic, such patches would also be kindly accepted. Generating the pages on access should be fairly cheap; the SQL is select filename,md5_digest from release_files where name='setuptools'; and putting the result of that into an ppix-like HTML page should be much faster than invoking ZPT. Regards, Martin
On Jul 20, 2007, at 4:21 AM, Martin v. Löwis wrote:
I've created and experimental prototype setuptools-specific package index at
Cool! If this proves useful, people are encouraged to contribute the proper patches to PyPI to regenerate the page directly on each log change.
There is a slight transactional trickiness to doing so: If you regenerate before the commit, it might be that the commit fails; then you would have to rollback the page update, too. If you regenerate after commit, it might be that you run into race conditions if the same package sees two updates in two transactions very quickly, and the second regeneration completes before the first one.
If people would find it easier to make these pages dynamic, such patches would also be kindly accepted. Generating the pages on access should be fairly cheap; the SQL is
select filename,md5_digest from release_files where name='setuptools';
and putting the result of that into an ppix-like HTML page should be much faster than invoking ZPT.
A few notes. It is important to show files from hidden releases as well as unhidden releases. I suspect the select statement above does that. I parse long descriptions to get #egg= links. I also give some special care to urls that point back to PyPI to avoid having setuptools go back to the human interface. It might be easiest to just trigger the existing ppix sw to poll after a change. Thanks to your xmlrpc addition, polling is quite cheap. Alternatively, we could install the existing software in a way that polls more or less continuously. This would be quite trivial. What you suggest is probably cleaner but requires some expertise with the current software. :) I'd much rather generate static files (as I'm doing now) than serve these dynamically. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
thanks jim. you save our day. we'll send some austrian cheese over :) jodok On 19.07.2007, at 13:06, Jim Fulton wrote:
Over the past few months, we've struggled quite a bit with Python Package Index (PyPI) performance and stability. Thanks to the heroic efforts of Martin v. Löwis and others, performance and especially stability have improved quite a bit. Martin has demonstrated that, at least when running well, PyPI seems to answer most requests on the order of 7 miliseconds (around 150 requests per second) internally. That's not bad. Unfortunately for users, actual times can be quite a bit longer. For me at work, request take around 300 milliseconds. For Martin, they seem to take somewhat longer. 300 milliseconds isn't so bad for a request or two, however, easy install can easily make 10s or even hundreds of requests to satisfy a user request for a package. zc.buildout, when verifying that a large system with many tens of packages has the most up to date versions of each package can easily make thousands of requests.
Why do setuptools and buildout make so many requests? If a package exposes more than one release, then setuptools checks the package's main PyPI page and the pages for each release. We need to be able to easily use older releases, so we can't hide old releases. Typical projects of ours have many old releases exposed. If setuptools was more clever in the way it searched PyPI, but it would still have to make a minimum of 2 requests per package for packages with multiple versions exposed.
Another potential issue is that PyPI pages can be large. I've found it convenient to use PyPI package pages as the home page for many of my projects. I like to include package documentation in my project pages. Perhaps this is an abuse of PyPI, but it is very convenient for me and no one has complained. :) The zc.buildout pages are around 200K. That's a fair bit of data for setuptools to download and scan for download URLs.
In the course of this discussion, I've realized that it doesn't make sense for setuptools to use the same interface that humans use. setuptools doesn't need to see all of the data that is useful to humans. Similarly, humans generally don't need to see all of the historical releases for a project. I suggested a simple page format designed just for setuptools. An alternative would be an xmlrpc API. I prefer pages because I think that, over time, the amount of requests from automated tools like easy_install and zc.buildout will increase substantially and ultimately, will overwhelm dynamic servers, even ones like PyPI that are reasonably fast. I also think that a simple static collection of pages will be easier to mirror and I think some number of geographic mirrors is likely to help some people. I promised to prototype the format I suggested.
I've created and experimental prototype setuptools-specific package index at
Going to that page gives brief instructions for using it with easy_install and zc.buildout. To see an individual package page, add the package name to the URL, as in:
http://download.zope.org/ppix/setuptools/
A few things to note about this:
- I don't expose a long package list at http://download.zope.org/ ppix/. The long package list would be expensive to download and supports a use case that I consider to be of negative value, which is installing packages with case-insensitive package names, I think it is important for humans to be able to search for packages using case- insensitive search terms, but I think that, after identifying a package, precise package names should be used. I think it is especially important that precise package names be used in package requirements.
- There is a single page per package. This can greatly reduce the number of requests. Packages that store all of their distributions in PyPI and that don't have off-site home pages or download URLs can be scanned with a single request. Note that I excluded home page and download URLs that pointed back to the packages PyPI page, as that wouldn't provide any new information to setuptools.
- Download URLs for *hidden* packages are included. Humans don't need to see old revisions, but setuptools-based tools do. If we used an index like this for setuptools, we could stop unhiding old releases when we created new releases in PyPI. This would make PyPI more useful to humans and less of a pain for developers.
- Download URLs are the same as they are in PyPI. Using this new index, distributions are still downloaded from PyPI, so the index doesn't affect PyPI download statistics.
To see the impact of this, it's interesting to look at installing zc.buildout using easy_install from PyPI and from the experimental index: Installing using PyPI looks like this:
(env)jim@ds9:~/tmp$ time easy_install zc.buildout Searching for zc.buildout Reading http://cheeseshop.python.org/pypi/zc.buildout/ Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b19 Reading http://svn.zope.org/zc.buildout Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b22 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b23 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b20 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b21 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b26 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b27 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b24 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b25 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b28 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b17 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b16 Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b18 Best match: zc.buildout 1.0.0b28 Downloading http://cheeseshop.python.org/packages/2.5/z/ zc.buildout/zc.buildout-1.0.0b28- py2.5.egg#md5=4e37e53f010ed7984555a029732f479d Processing zc.buildout-1.0.0b28-py2.5.egg creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- py2.5.egg Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/lib/ python2.5 Adding zc.buildout 1.0.0b28 to easy-install.pth file Installing buildout script to /home/jim/tmp/env/bin/
Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- py2.5.egg Processing dependencies for zc.buildout Searching for setuptools==0.6c6 Best match: setuptools 0.6c6 Processing setuptools-0.6c6-py2.5.egg Adding setuptools 0.6c6 to easy-install.pth file Installing easy_install script to /home/jim/tmp/env/bin/ Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6- py2.5.egg Processing dependencies for setuptools==0.6c6 Finished processing dependencies for setuptools==0.6c6 Finished installing setuptools==0.6c6 Finished processing dependencies for zc.buildout Finished installing zc.buildout
real 0m31.360s user 0m1.136s sys 0m0.060s
Note the large number of pages read. Here I was installing a single package with one dependency, setuptools, that was already installed. Let's look at this again using the experimental index:
(env)jim@ds9:~/tmp$ time easy_install -i http://download.zope.org/ ppix zc.buildout Searching for zc.buildout Reading http://download.zope.org/ppix/zc.buildout/ Best match: zc.buildout 1.0.0b28 Downloading http://cheeseshop.python.org/packages/2.5/z/ zc.buildout/zc.buildout-1.0.0b28- py2.5.egg#md5=4e37e53f010ed7984555a029732f479d Processing zc.buildout-1.0.0b28-py2.5.egg creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- py2.5.egg Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/lib/ python2.5 Adding zc.buildout 1.0.0b28 to easy-install.pth file Installing buildout script to /home/jim/tmp/env/bin/
Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- py2.5.egg Processing dependencies for zc.buildout Searching for setuptools==0.6c6 Best match: setuptools 0.6c6 Processing setuptools-0.6c6-py2.5.egg Adding setuptools 0.6c6 to easy-install.pth file Installing easy_install script to /home/jim/tmp/env/bin/ Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6- py2.5.egg Processing dependencies for setuptools==0.6c6 Finished processing dependencies for setuptools==0.6c6 Finished installing setuptools==0.6c6 Finished processing dependencies for zc.buildout Finished installing zc.buildout
real 0m7.006s user 0m0.244s sys 0m0.040s
Note:
- We made far fewer requests with the new index
- Most of the time in the second example was spent actually downloading the buildout distribution. Most of the time in the first example was spent reading the index.
- I used workingenv to create clean environments for each of the examples above.
WRT zc.buildout, refreshing a buildout with just ZODB installed in it takes about 45 seconds for me using PyPI and about 5 seconds using the experimental index.
Some of the speed improvements is due to the fact that the experimental index is much closer to me (on the net) than PyPI. ATM, requests to PyPI take *me* around 500 milliseconds, while requests to the experimental index are taking between 100 and 300 milliseconds. (I'm at home and this seems to be somewhat variable.) Most of the speed improvements are from reducing the number of requests.
I'm polling PyPI once a minute to get and apply updates. Thanks to the new XML-RPC method that Martin added, this is very efficient to do.
I encourage people to check this out and even try using it with easy_install and especially buildout. AFAIK, aside from being much faster and showing download files for hidden releases it is completely equivalent to PyPI for setuptools use. My intension is to keep this experimental index going and up to date for the foreseeable future and plan to use it for all my work.
My primary goal is to prototype the new index format. If this seems useful, then I think that www.python.org should expose an index in this format to setuptools, either at a different URL or by satisfying setuptools requests from the index based on client information. I'd love to see this index populated via a baking mechanism that updates package pages when they change, rather than through polling as I'm doing.
There would be some benefit to having geographic mirrors. I suspect that having such mirrors available would improve performance further, at least for some folks. It might also be useful to have some mirrors for redundancy purposes. Note though that what I'm doing is mirroring the only index data. I'm not mirroring distributions. Of course, I'd be happy to make my software available. (It already is via our subversion repository.)
I hope this effort spurs useful discussion and progress.
Jim
-- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
_______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
-- "Although never is often better than *right* now." -- The Zen of Python, by Tim Peters Jodok Batlogg, Lovely Systems Schmelzhütterstraße 26a, 6850 Dornbirn, Austria phone: +43 5572 908060, fax: +43 5572 908060-77
Am Donnerstag, den 19.07.2007, 07:06 -0400 schrieb Jim Fulton:
I promised to prototype the format I suggested.
I've created and experimental prototype setuptools-specific package index at
Yay! This works like a charme!
There would be some benefit to having geographic mirrors. I suspect that having such mirrors available would improve performance further, at least for some folks. It might also be useful to have some mirrors for redundancy purposes. Note though that what I'm doing is mirroring the only index data. I'm not mirroring distributions. Of course, I'd be happy to make my software available. (It already is via our subversion repository.)
I'd be happy to support mirroring once all this is sorted out/ I can offer a server in Germany/Europe. Christian
On Jul 20, 2007, at 6:02 AM, Christian Theune wrote: ...
I'd be happy to support mirroring once all this is sorted out/ I can offer a server in Germany/Europe.
If we decide that mirrors would be a good idea, it will be important, imo, to select mirror sites bases on their connectivity. The goal of the mirrors should be to try to give people options with short network distances. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
Am Freitag, den 20.07.2007, 07:48 -0400 schrieb Jim Fulton:
On Jul 20, 2007, at 6:02 AM, Christian Theune wrote: ...
I'd be happy to support mirroring once all this is sorted out/ I can offer a server in Germany/Europe.
If we decide that mirrors would be a good idea, it will be important, imo, to select mirror sites bases on their connectivity. The goal of the mirrors should be to try to give people options with short network distances.
Right, however, do you have any specific parameters that can be measured in mind? (Our server is reasonably well connected, reachable with about 5 hops from within Germany with latency around 40ms on a DSL line. Multiple GBit lines to the hosting center.) Christian
On Jul 20, 2007, at 7:52 AM, Christian Theune wrote:
Am Freitag, den 20.07.2007, 07:48 -0400 schrieb Jim Fulton:
On Jul 20, 2007, at 6:02 AM, Christian Theune wrote: ...
I'd be happy to support mirroring once all this is sorted out/ I can offer a server in Germany/Europe.
If we decide that mirrors would be a good idea, it will be important, imo, to select mirror sites bases on their connectivity. The goal of the mirrors should be to try to give people options with short network distances.
Right, however, do you have any specific parameters that can be measured in mind?
I'm not enough of a network expert. Hopefully, someone more knowledgeable will make a suggestion. BTW, with the current PyPI performance, I'm guessing we could have 10s of mirrors poll once a minute without affecting other users.
(Our server is reasonably well connected, reachable with about 5 hops from within Germany with latency around 40ms on a DSL line. Multiple GBit lines to the hosting center.)
I didn't mean to suggest that you weren't well connected. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
At 07:06 AM 7/19/2007 -0400, Jim Fulton wrote:
I've created and experimental prototype setuptools-specific package index at
Going to that page gives brief instructions for using it with easy_install and zc.buildout.
FYI, the handling of homepage and download links is broken. You have e.g. 'meta="homepage"' instead of 'rel="homepage"', so easy_install doesn't pick these up and look for links there, meaning that ppix fails to find downloads for e.g. pywin32 which is hosted at Sourceforge. (On a perhaps not entirely unrelated note, the Cheeseshop appears to be down at the moment: """Error... There's been a problem with your request psycopg.OperationalError: no connection to the server""") By the way, I'd suggest explaining (or linking to an explanation) on the ppix main page describing how to configure easy_install such that the '-i' option isn't necessary. Perhaps we could add an example to the EasyInstall docs somewhere near: http://peak.telecommunity.com/DevCenter/EasyInstall#creating-your-own-packag... and then link to it from the ppix page.
On Jul 20, 2007, at 4:09 PM, Phillip J. Eby wrote:
At 07:06 AM 7/19/2007 -0400, Jim Fulton wrote:
I've created and experimental prototype setuptools-specific package index at
Going to that page gives brief instructions for using it with easy_install and zc.buildout.
FYI, the handling of homepage and download links is broken. You have e.g. 'meta="homepage"' instead of 'rel="homepage"', so easy_install doesn't pick these up and look for links there, meaning that ppix fails to find downloads for e.g. pywin32 which is hosted at Sourceforge.
Doh! Fixed.
(On a perhaps not entirely unrelated note, the Cheeseshop appears to be down at the moment:
"""Error...
There's been a problem with your request
psycopg.OperationalError: no connection to the server""")
By the way, I'd suggest explaining (or linking to an explanation) on the ppix main page describing how to configure easy_install such that the '-i' option isn't necessary.
If you send me some text, I'd be happy to add it to the ppix main page.
Perhaps we could add an example to the EasyInstall docs somewhere near:
http://peak.telecommunity.com/DevCenter/EasyInstall#creating-your- own-package-index
and then link to it from the ppix page.
+1 Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
(On a perhaps not entirely unrelated note, the Cheeseshop appears to be down at the moment:
"""Error...
There's been a problem with your request
psycopg.OperationalError: no connection to the server""")
Around that time, the Postgres log has these entries: 2007-07-20 21:53:24 [14636] LOG: received fast shutdown request 2007-07-20 21:53:24 [14636] LOG: aborting any active transactions 2007-07-20 21:53:24 [26166] FATAL: terminating connection due to administrator command 2007-07-20 21:53:24 [15769] FATAL: terminating connection due to administrator command 2007-07-20 21:53:24 [10390] FATAL: terminating connection due to administrator command 2007-07-20 21:53:24 [31182] FATAL: terminating connection due to administrator command 2007-07-20 21:53:24 [30066] FATAL: terminating connection due to administrator command 2007-07-20 21:53:24 [10162] FATAL: terminating connection due to administrator command 2007-07-20 21:53:24 [17452] FATAL: terminating connection due to administrator command 2007-07-20 21:53:24 [17147] FATAL: terminating connection due to administrator command 2007-07-20 21:53:24 [1159] LOG: shutting down 2007-07-20 21:53:26 [1159] LOG: database system is shut down 2007-07-20 21:53:33 [1469] LOG: database system was shut down at 2007-07-20 21:53:26 CEST 2007-07-20 21:53:33 [1469] LOG: checkpoint record is at A/FD833F0 2007-07-20 21:53:33 [1469] LOG: redo record is at A/FD833F0; undo record is at 0/0; shutdown TRUE 2007-07-20 21:53:33 [1469] LOG: next transaction ID: 110977718; next OID: 61913929 2007-07-20 21:53:33 [1469] LOG: database system is ready and Sean Reifschneider was logged in, so I suspect he did some maintenance work. Sean? Regards, Martin
On Sat, Jul 21, 2007 at 08:05:13AM +0200, "Martin v. L?wis" wrote:
Around that time, the Postgres log has these entries:
There was an upgrade of Postgres done earlier, as far as I can see,
pypi is running. It must have been resolved earlier. AMK mentioned there
was a problem with the upgrade restart and Apache had to be restarted, that
was like 6 hours ago though.
Thanks,
Sean
--
"I not only use all the brains that I have, but all that I can borrow."
-- Woodrow Wilson
Sean Reifschneider, Member of Technical Staff
I've created and experimental prototype setuptools-specific package index at
I've now added something similar as http://cheeseshop.python.org/simple/ It differs from your site in a few ways: - it does include a top-level index of all packages (but neither releases nor descriptions) - it's always current, due to being dynamically computed - it may differ in the precise list of URLs displayed; if there are important deviations, please let me know. Regards, Martin
On Jul 21, 2007, at 1:00 PM, Martin v. Löwis wrote:
I've created and experimental prototype setuptools-specific package index at
I've now added something similar as
Way cool!
It differs from your site in a few ways:
- it does include a top-level index of all packages (but neither releases nor descriptions)
Why? This is a relatively expensive page, due to it's size I assume, that really provides no value. This will slow down setuptools.
- it's always current, due to being dynamically computed
And also unreliable, for the same reason. For example, it would have been inaccessible yesterday afternoon. And also puts more load on the server. It would be much better imo if static pages could be written on writes.
- it may differ in the precise list of URLs displayed; if there are important deviations, please let me know.
The download and homepage URL anchors need rel="download" or rel="homepage". They lack the #egg= links. Compare your page for setuptools to mine. Also, some packages use their pypi pages as their home page links. You want to exclude these, otherwise, setuptools will circle around to the human interface, which defeats point of the simple interface. Thanks for plugging away on this. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
- it does include a top-level index of all packages (but neither releases nor descriptions)
Why? This is a relatively expensive page, due to it's size I assume, that really provides no value. This will slow down setuptools.
IIUC, it won't slow down setuptools, as setuptools looks at it only if it cannot find the real package page due to a misspelling. So as long as everything is spelled correctly, it should not provide any slowdown. If people do misspell a package name when invoking easy_install, they get the feature that you consider of no value. As for performance - 30 downloads take 3.9s currently from nearby.
- it's always current, due to being dynamically computed
And also unreliable, for the same reason. For example, it would have been inaccessible yesterday afternoon.
The same could happen to Apache, too, of course. svn.python.org sometimes fails to restart when a restart is request on log rotation. Any software is unreliable; to reduce downtime, you need an operator that is available when something breaks.
And also puts more load on the server. It would be much better imo if static pages could be written on writes.
Contributions are welcome. In addition to me considering it futile, I also don't know how to implement it correctly.
- it may differ in the precise list of URLs displayed; if there are important deviations, please let me know.
The download and homepage URL anchors need rel="download" or rel="homepage".
Done.
They lack the #egg= links.
How are these computed?
Also, some packages use their pypi pages as their home page links.
Ok, done. Regards, Martin
On Jul 21, 2007, at 3:08 PM, Martin v. Löwis wrote:
- it does include a top-level index of all packages (but neither releases nor descriptions)
Why? This is a relatively expensive page, due to it's size I assume, that really provides no value. This will slow down setuptools.
IIUC, it won't slow down setuptools, as setuptools looks at it only if it cannot find the real package page due to a misspelling. So as long as everything is spelled correctly, it should not provide any slowdown.
If people do misspell a package name when invoking easy_install, they get the feature that you consider of no value.
That is not correct. Not all packages are in PyPI. Using a package that isn't in PyPI will trigger a fetch of that page. It isn't misspelled, it's just not there. People should *not* misspell pages when using setuptools. They should certainly not use misspelled package names in requirements. In my strongly help opinion, allowing imprecise names in requirements and setuptools command if of negative value.
As for performance - 30 downloads take 3.9s currently from nearby.
That's nice. For me, that page takes 3 or 4 times as long as other pages.
- it's always current, due to being dynamically computed
And also unreliable, for the same reason. For example, it would have been inaccessible yesterday afternoon.
The same could happen to Apache, too, of course. svn.python.org sometimes fails to restart when a restart is request on log rotation.
Any software is unreliable; to reduce downtime, you need an operator that is available when something breaks.
Apache has a far better record than the cheeseshop. I give up.
And also puts more load on the server. It would be much better imo if static pages could be written on writes.
Contributions are welcome. In addition to me considering it futile, I also don't know how to implement it correctly.
I'd be happy to contribute my polling version. That solves my problems and I can't justify the additional effort to figure out the cheeseshop softtware. ...
They lack the #egg= links.
How are these computed?
By parsing the description. Apparently, I'm going this incorrectly. I'll have to look into that. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
If people do misspell a package name when invoking easy_install, they get the feature that you consider of no value.
That is not correct. Not all packages are in PyPI. Using a package that isn't in PyPI will trigger a fetch of that page.
I don't understand. What page is fetched if the package is not in PyPI?
It isn't misspelled, it's just not there. People should *not* misspell pages when using setuptools. They should certainly not use misspelled package names in requirements. In my strongly help opinion, allowing imprecise names in requirements and setuptools command if of negative value.
I cannot comment on. I don't use setuptools, and have no intuition what is good or bad when using it (for example, I consider .egg files and the notion of eggs inherently bad). My main motivation to provide that page is that the setuptools specification says it should be there. As this entire infrastructure is for the sake of setuptools, I find it pointless to not support setuptools fully.
I'd be happy to contribute my polling version. That solves my problems and I can't justify the additional effort to figure out the cheeseshop softtware.
I'd like to hear other opinions here. Would people prefer if the index was always correct (and perhaps somewhat slow), or would they prefer instead that it is super-efficient (and somewhat out-of-date)? Regards, Martin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin v. Löwis wrote:
If people do misspell a package name when invoking easy_install, they get the feature that you consider of no value. That is not correct. Not all packages are in PyPI. Using a package that isn't in PyPI will trigger a fetch of that page.
I don't understand. What page is fetched if the package is not in PyPI?
I think Jim was referring to a package which is *registered* in PyPI, but whose download location was elsewhere. <snip>
I'd be happy to contribute my polling version. That solves my problems and I can't justify the additional effort to figure out the cheeseshop softtware.
I'd like to hear other opinions here. Would people prefer if the index was always correct (and perhaps somewhat slow), or would they prefer instead that it is super-efficient (and somewhat out-of-date)?
I would prefer the second, particularly as I think the caching solution lends itself to mirroring, which would also improve availability. - From my complete ignorance of the underlying architecture: the polling solution would stay pretty current if there were an extremely cheap way to ask for the latest "transaction ID" on the cheeseshop, or if the query could fetch only registrations newer than the last poll time. Are such queries possible over the XML-RPC interface? Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGo4bH+gerLs4ltQ4RAjiWAJ9/5TeOWAHdwL7PS5QAUnpyZWJzMQCeN5hT 5rRjOHzAu4cf+TKktNntWV8= =p59N -----END PGP SIGNATURE-----
On Jul 22, 2007, at 12:33 PM, Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Martin v. Löwis wrote:
If people do misspell a package name when invoking easy_install, they get the feature that you consider of no value. That is not correct. Not all packages are in PyPI. Using a package that isn't in PyPI will trigger a fetch of that page.
I don't understand. What page is fetched if the package is not in PyPI?
I think Jim was referring to a package which is *registered* in PyPI, but whose download location was elsewhere.
No, I was referring to packages that aren't ready for or of interest to PyPI or to proprietary packages. ...
- From my complete ignorance of the underlying architecture: the polling solution would stay pretty current if there were an extremely cheap way to ask for the latest "transaction ID" on the cheeseshop, or if the query could fetch only registrations newer than the last poll time.
There is such an API thanks to Martin.
Are such queries possible over the XML-RPC interface?
Yup. I'm using them. Queries take only a few milliseconds per request on the server. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
I would prefer the second, particularly as I think the caching solution lends itself to mirroring, which would also improve availability.
I think this conclusion is wrong: Jim already has a mirror infrastructure that anybody can run, without the need of running that on the central server.
- From my complete ignorance of the underlying architecture: the polling solution would stay pretty current if there were an extremely cheap way to ask for the latest "transaction ID" on the cheeseshop, or if the query could fetch only registrations newer than the last poll time. Are such queries possible over the XML-RPC interface?
Yes; you can ask for all changes since a certain UTC time. People shouldn't invoke that every UTC second, though - once a minute is fine. Regards, Martin
On Jul 22, 2007, at 12:24 PM, Martin v. Löwis wrote:
If people do misspell a package name when invoking easy_install, they get the feature that you consider of no value.
That is not correct. Not all packages are in PyPI. Using a package that isn't in PyPI will trigger a fetch of that page.
I don't understand. What page is fetched if the package is not in PyPI?
We have lots of packages that aren't in PyPI. Some of them aren't ready for PyPI or are not of general interest. Some are proprietary.
It isn't misspelled, it's just not there. People should *not* misspell pages when using setuptools. They should certainly not use misspelled package names in requirements. In my strongly help opinion, allowing imprecise names in requirements and setuptools command if of negative value.
I cannot comment on. I don't use setuptools, and have no intuition what is good or bad when using it (for example, I consider .egg files and the notion of eggs inherently bad).
My main motivation to provide that page is that the setuptools specification says it should be there. As this entire infrastructure is for the sake of setuptools, I find it pointless to not support setuptools fully.
Fair enough. Theory beats practicality every time. ;)
I'd be happy to contribute my polling version. That solves my problems and I can't justify the additional effort to figure out the cheeseshop softtware.
I'd like to hear other opinions here.
Yes. This has been a fairly limited discussion. Sigh.
Would people prefer if the index was always correct (and perhaps somewhat slow), or would they prefer instead that it is super-efficient (and somewhat out-of-date)?
Where somewhat out of date could be a matter of seconds. IMO, a python.org index could poll every few seconds, given that local polling only takes a few milliseconds. I have a feeling that this discussion is going to annoy someone with PyPI software knowledge enough to add baking on write. :) For example, I had the impression that Rene' was planning to invoke scripts after updates. It would be easy to invoke my polling script or a script based on your work, BTW, I'm pretty sure that geographic mirrors are desirable, both for performance and redundancy reasons. I think that, for these, polling once a minute is plenty and puts negligible load on PyPI, assuming that there aren't hundreds of them. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
Jim Fulton schrieb:
On Jul 22, 2007, at 12:24 PM, Martin v. Löwis wrote:
If people do misspell a package name when invoking easy_install, they get the feature that you consider of no value.
That is not correct. Not all packages are in PyPI. Using a package that isn't in PyPI will trigger a fetch of that page.
I don't understand. What page is fetched if the package is not in PyPI?
We have lots of packages that aren't in PyPI. Some of them aren't ready for PyPI or are not of general interest. Some are proprietary.
Ah, ok. So I stand to my original statement (the one you classified as incorrect): *If* I do misspell a package name, *then* setuptools will correct the spelling if the index page is available.
Would people prefer if the index was always correct (and perhaps somewhat slow), or would they prefer instead that it is super-efficient (and somewhat out-of-date)?
Where somewhat out of date could be a matter of seconds.
And where somewhat slower could be "practically not noticable".
BTW, I'm pretty sure that geographic mirrors are desirable, both for performance and redundancy reasons. I think that, for these, polling once a minute is plenty and puts negligible load on PyPI, assuming that there aren't hundreds of them.
Sure: I don't mind at all if more people run your software on their machines. If people want it more official, we can have "cheeseshop0.python.org", "cheeseshop1.python.org", and so on, or "de.cheeseshop.python.org", "jp.cheeseshop.python.org", and so on. As I said before: if people also want to mirror the files, I'd ask them provide download statistics. Given the changelog, it would be easy to keep a file mirror up-to-date (of course, if a mirror downloads all files, these downloads also count towards the download statistics - which might confuse people). Regards, Martin
On Jul 22, 2007, at 1:03 PM, Martin v. Löwis wrote:
Jim Fulton schrieb:
On Jul 22, 2007, at 12:24 PM, Martin v. Löwis wrote:
If people do misspell a package name when invoking easy_install, they get the feature that you consider of no value.
That is not correct. Not all packages are in PyPI. Using a package that isn't in PyPI will trigger a fetch of that page.
I don't understand. What page is fetched if the package is not in PyPI?
We have lots of packages that aren't in PyPI. Some of them aren't ready for PyPI or are not of general interest. Some are proprietary.
Ah, ok. So I stand to my original statement (the one you classified as incorrect): *If* I do misspell a package name, *then* setuptools will correct the spelling if the index page is available.
Your full original statement was: On Jul 21, 2007, at 3:08 PM, Martin v. Löwis wrote:
IIUC, it won't slow down setuptools, as setuptools looks at it only if it cannot find the real package page due to a misspelling. So as long as everything is spelled correctly, it should not provide any slowdown.
If people do misspell a package name when invoking easy_install, they get the feature that you consider of no value.
I was referring to the part about not slowing things down when people didn't misspell. But it looks like I was mistaken. It was my understanding that setuptools always checked index/ when it couldn't find index/package_name/, but as Phillip pointed out, if it finds a package via find links, it won't look at index/. Basic tests seem to confirm this.
Would people prefer if the index was always correct (and perhaps somewhat slow), or would they prefer instead that it is super-efficient (and somewhat out-of-date)?
Where somewhat out of date could be a matter of seconds.
And where somewhat slower could be "practically not noticable".
I wasn't arguing about speed. I agree that when PyPI is working well, the difference between the speed of the dynamic page and the speed of a static page wouldn't be noticeable. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
Martin v. Löwis wrote:
And where somewhat slower could be "practically not noticable".
Perhaps it /could/ be, but isn't currently. For example, updating one piece of software I have with almost 150 dependencies takes 45 seconds with ppix, 4:45 without. I plan to do similar timings with the "simple" PyPI interface when I get a chance and report the results here. -- Benji York http://benjiyork.com
On Jul 23, 2007, at 2:58 PM, Benji York wrote:
Martin v. Löwis wrote:
And where somewhat slower could be "practically not noticable".
Perhaps it /could/ be, but isn't currently. For example, updating one piece of software I have with almost 150 dependencies takes 45 seconds with ppix, 4:45 without. I plan to do similar timings with the "simple" PyPI interface when I get a chance and report the results here.
I suspect that this has more to do with network distance than with server speed. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
On 7/23/07, Jim Fulton
On Jul 23, 2007, at 2:58 PM, Benji York wrote:
Martin v. Löwis wrote:
And where somewhat slower could be "practically not noticable".
Perhaps it /could/ be, but isn't currently. For example, updating one piece of software I have with almost 150 dependencies takes 45 seconds with ppix, 4:45 without. I plan to do similar timings with the "simple" PyPI interface when I get a chance and report the results here.
I suspect that this has more to do with network distance than with server speed.
That is an interesting point. It is amazing how many directory type things get slammed, but the problem is really latency...such as a slow DNS lookup. I wonder how much quicker an easy_install would be will local DNS lookups,package names, etc. I had a problem with a LDAP server I setup that was really tricky to figure out until I wrote some scripts that ran continuously getting stats, and I realized that a DNS server would hang occasionally and it would grind everything to a halt. People kept telling me they would have an occasional 'ls -l' that would hang for 20 seconds. Caching DNS servers fixed it. Jim
-- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Jim Fulton wrote:
On Jul 23, 2007, at 2:58 PM, Benji York wrote:
And where somewhat slower could be "practically not noticable". Perhaps it /could/ be, but isn't currently. For example, updating one piece of software I have with almost 150 dependencies takes 45 seconds with ppix, 4:45 without. I plan to do similar timings with
Martin v. Löwis wrote: the "simple" PyPI interface when I get a chance and report the results here.
I suspect that this has more to do with network distance than with server speed.
That's actually my point. Geographically distributed mirrors that are a little out of sync are much more valuable (IMO) than a centralized service that is absolutely up to date, but "far" away. For me the static/dynamic argument is more about stability, and central/distributed is more about (network) speed. -- Benji York http://benjiyork.com
And where somewhat slower could be "practically not noticable". Perhaps it /could/ be, but isn't currently. For example, updating one piece of software I have with almost 150 dependencies takes 45 seconds with ppix, 4:45 without. I plan to do similar timings with the "simple" PyPI interface when I get a chance and report the results here.
I suspect that this has more to do with network distance than with server speed.
That's actually my point. Geographically distributed mirrors that are a little out of sync are much more valuable (IMO) than a centralized service that is absolutely up to date, but "far" away.
Ok, but then your response didn't really answer my question. If people want to run distributed mirrors that are somewhat behind, by all means: start today (just remember to talk to me if you also want to mirror files - if not, just run Jim's software as-is). My question was about the "simple" interface on the central server, to which you seem to say "I don't need it at all - whether it's current and slow or behind and fast" (which, in a sense, is also a response to the question, namely "I don't care"). Regards, Martin
Martin v. Löwis wrote:
My question was about the "simple" interface on the central server
Ah, I didn't realize.
to which you seem to say "I don't need it at all - whether it's current and slow or behind and fast" (which, in a sense, is also a response to the question, namely "I don't care").
I think it's a great idea to have both human- and machine-targeted versions available. It looks like setuptools is about twice as fast (in at least one instance) with the simple version. That seems like a pretty big win to me. -- Benji York http://benjiyork.com
Perhaps it /could/ be, but isn't currently. For example, updating one piece of software I have with almost 150 dependencies takes 45 seconds with ppix, 4:45 without. I plan to do similar timings with the "simple" PyPI interface when I get a chance and report the results here.
I was, of course, talking about the simple interface. The full index will certainly take much more time because setuptools has to request more pages, and each page contains a lot of unnecessary data. Regards, Martin
Benji York wrote:
I plan to do similar timings with the "simple" PyPI interface when I get a chance and report the results here.
Here are my non-scientific results: buildout times: regular: 4:52.86 simple: 3:15.57 ppix: 2:03.58 As everyone is aware, network latency has a large impact on this so here are the shortest round-trip packet times I got (with a small sample). cheeseshop.python.org: 93ms download.zope.org: 8ms I suspect the majority/entirety of the difference between ppix and simple is network related. -- Benji York http://benjiyork.com
Martin v. Löwis wrote:
If people do misspell a package name when invoking easy_install, they get the feature that you consider of no value.
That is not correct. Not all packages are in PyPI. Using a package that isn't in PyPI will trigger a fetch of that page.
I don't understand. What page is fetched if the package is not in PyPI?
It isn't misspelled, it's just not there. People should *not* misspell pages when using setuptools. They should certainly not use misspelled package names in requirements. In my strongly help opinion, allowing imprecise names in requirements and setuptools command if of negative value.
I cannot comment on. I don't use setuptools, and have no intuition what is good or bad when using it (for example, I consider .egg files and the notion of eggs inherently bad).
Oh right. It makes using Python and packages *dramatically* easier. I guess you are resigned to their arrival though. ;-)
My main motivation to provide that page is that the setuptools specification says it should be there. As this entire infrastructure is for the sake of setuptools, I find it pointless to not support setuptools fully.
Great.
I'd be happy to contribute my polling version. That solves my problems and I can't justify the additional effort to figure out the cheeseshop softtware.
I'd like to hear other opinions here. Would people prefer if the index was always correct (and perhaps somewhat slow), or would they prefer instead that it is super-efficient (and somewhat out-of-date)?
I'd prefer accuracy over speed here. Michael Foord
Regards, Martin _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Martin v. Löwis wrote:
would they prefer instead that it is super-efficient (and somewhat out-of-date)?
Yes. At most a few minutes out of date and faster/more reliable would be my strong preference. -- Benji York http://benjiyork.com
At 09:09 AM 7/22/2007 -0400, Jim Fulton wrote:
People should *not* misspell pages when using setuptools. They should certainly not use misspelled package names in requirements.
People do all sorts of things they shouldn't. That doesn't stop them blaming other people for their mistakes. It's said that a 10% improvement in ease-of-use can double a product's users. Case sensitivity is a barrier to entry for new users, and setuptools can't afford any additional entry barriers. A significant part of setuptools' audience includes people who are new to Python, or at least new to installing or distributing Python modules, and quite a lot of setuptools features are aimed squarely at that audience. This happens to be one of them.
In my strongly help opinion, allowing imprecise names in requirements and setuptools command if of negative value.
I understand that perspective. But practicality beats purity, and this is absolutely a "worse is better" type of situation. Setuptools has lots of features that are targeted at different audiences. There are plenty of features targeted at the group you're in, don't begrudge the other groups their features. :) (This is probably one reason that setuptools is so controversial; everybody can find *something* about it to hate, even if those very same things are quite loved by a different group of users. E.g. you and case-insensitivity, Martin and eggs, etc.)
On 7/22/07, Phillip J. Eby
Setuptools has lots of features that are targeted at different audiences. There are plenty of features targeted at the group you're in, don't begrudge the other groups their features. :)
Actually, I suspect this is a substantial contributor to setuptools being considered controversial: it encompasses to many different features. That certainly keeps me feeling unhappy about depending on it. -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> "Chaos is the score upon which reality is written." --Henry Miller
On Jul 22, 2007, at 12:51 PM, Phillip J. Eby wrote:
At 09:09 AM 7/22/2007 -0400, Jim Fulton wrote:
People should *not* misspell pages when using setuptools. They should certainly not use misspelled package names in requirements.
People do all sorts of things they shouldn't. That doesn't stop them blaming other people for their mistakes.
It's said that a 10% improvement in ease-of-use can double a product's users. Case sensitivity is a barrier to entry for new users, and setuptools can't afford any additional entry barriers.
I totally don't buy this in a case like this. People installing packages with setuptools are technical users. We expect them to write Python scripts.
A significant part of setuptools' audience includes people who are new to Python, or at least new to installing or distributing Python modules, and quite a lot of setuptools features are aimed squarely at that audience. This happens to be one of them.
I don't think that encouraging use of case insensitive names by people who are about start learning a language that uses case sensitive names is doing them any favors.
In my strongly help opinion, allowing imprecise names in requirements and setuptools command if of negative value.
I understand that perspective. But practicality beats purity, and this is absolutely a "worse is better" type of situation.
Obviously we disagree.
Setuptools has lots of features that are targeted at different audiences. There are plenty of features targeted at the group you're in, don't begrudge the other groups their features. :)
I don't think you are helping them. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
At 07:08 AM 7/23/2007 -0400, Jim Fulton wrote:
On Jul 22, 2007, at 12:51 PM, Phillip J. Eby wrote:
At 09:09 AM 7/22/2007 -0400, Jim Fulton wrote:
People should *not* misspell pages when using setuptools. They should certainly not use misspelled package names in requirements.
People do all sorts of things they shouldn't. That doesn't stop them blaming other people for their mistakes.
It's said that a 10% improvement in ease-of-use can double a product's users. Case sensitivity is a barrier to entry for new users, and setuptools can't afford any additional entry barriers.
I totally don't buy this in a case like this. People installing packages with setuptools are technical users. We expect them to write Python scripts.
No, "we" don't. Eggs were created to support application-level plugins, such as are used by Trac and Chandler. Trac and Chandler users are not necessarily programmers, let alone Python programmers.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Phillip J. Eby wrote:
At 07:08 AM 7/23/2007 -0400, Jim Fulton wrote:
People should *not* misspell pages when using setuptools. They should certainly not use misspelled package names in requirements. People do all sorts of things they shouldn't. That doesn't stop
At 09:09 AM 7/22/2007 -0400, Jim Fulton wrote: them blaming other people for their mistakes.
It's said that a 10% improvement in ease-of-use can double a product's users. Case sensitivity is a barrier to entry for new users, and setuptools can't afford any additional entry barriers. I totally don't buy this in a case like this. People installing
On Jul 22, 2007, at 12:51 PM, Phillip J. Eby wrote: packages with setuptools are technical users. We expect them to write Python scripts.
No, "we" don't. Eggs were created to support application-level plugins, such as are used by Trac and Chandler. Trac and Chandler users are not necessarily programmers, let alone Python programmers.
But by definition, the people typing the names of the dependencies into a 'setup.py' for such a plugin *are* Python programmers, and could be expected to know about case sensitivity. I don't think Jim was areguing that human-centric *search* should punish misspellings, but rather that encouraging such sloppiness in other packages is a misfeature, especially if supporting it induces a tax on *all* users of automated dependency resolution. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGpNC++gerLs4ltQ4RAr2HAJ9UdPIVdz36inTG7nkm8SnrWPpcOgCgjKPc sOqbuwOhUvlsSYpgxFSz1mg= =F1EY -----END PGP SIGNATURE-----
But by definition, the people typing the names of the dependencies into a 'setup.py' for such a plugin *are* Python programmers, and could be expected to know about case sensitivity.
I don't think Jim was areguing that human-centric *search* should punish misspellings, but rather that encouraging such sloppiness in other packages is a misfeature, especially if supporting it induces a tax on *all* users of automated dependency resolution.
In my humble opinion, I for one completely agree with Phillip. I have had to sit down with quite a few new Python Programmers and show them how to use easy_install and I "thank God" easy_install is smart enough to figure out case sensitivity. This is a wonderful feature!!!! Please don't ever get rid of it :) Not being able to install a package as they couldn't figure out the exact name of the package could be the final straw for some new programmer to Python! Noah Gift
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Noah Gift wrote:
But by definition, the people typing the names of the dependencies into a 'setup.py' for such a plugin *are* Python programmers, and could be expected to know about case sensitivity.
I don't think Jim was areguing that human-centric *search* should punish misspellings, but rather that encouraging such sloppiness in other packages is a misfeature, especially if supporting it induces a tax on *all* users of automated dependency resolution.
In my humble opinion, I for one completely agree with Phillip. I have had to sit down with quite a few new Python Programmers and show them how to use easy_install and I "thank God" easy_install is smart enough to figure out case sensitivity. This is a wonderful feature!!!! Please don't ever get rid of it :) Not being able to install a package as they couldn't figure out the exact name of the package could be the final straw for some new programmer to Python!
There are two different use cases here: 1. User mis-types the name of a package on the command line, e.g.: $ easy_install Foo when it should be spelled: $ easy_install foo Being forgiving of case-mangling here ia a concern of the easy_install *application*, and is non-controversil. 2. Programmer mis-types the name of a package in the dependencies for his own pacakge, e.g.: setup(install_requires=['Foo']...) In this case, coddling the error causes it to *propagate*, becuase other programmers will copy it directly, or depend on the error- filled package. Worse, the cost of error correction is transferred to *all* users of the setuptools library, even if they never use 'easy_install' at all. I'm fine with leaving the newbie-friendly behavior in 'easy_install'; I just don't like the performance hit it induces on users of setuptools who *can* spell. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGpPgI+gerLs4ltQ4RApzMAJ0WP6gzaM8n99fxkyo0Se285Te3bQCg1vxF 6ihYIENH8GpsQ7/ZF062T4Q= =OuxU -----END PGP SIGNATURE-----
On Jul 23, 2007, at 2:48 PM, Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Noah Gift wrote:
But by definition, the people typing the names of the dependencies into a 'setup.py' for such a plugin *are* Python programmers, and could be expected to know about case sensitivity.
I don't think Jim was areguing that human-centric *search* should punish misspellings, but rather that encouraging such sloppiness in other packages is a misfeature, especially if supporting it induces a tax on *all* users of automated dependency resolution.
In my humble opinion, I for one completely agree with Phillip. I have had to sit down with quite a few new Python Programmers and show them how to use easy_install and I "thank God" easy_install is smart enough to figure out case sensitivity. This is a wonderful feature!!!! Please don't ever get rid of it :) Not being able to install a package as they couldn't figure out the exact name of the package could be the final straw for some new programmer to Python!
There are two different use cases here:
1. User mis-types the name of a package on the command line, e.g.:
$ easy_install Foo
when it should be spelled:
$ easy_install foo
Being forgiving of case-mangling here ia a concern of the easy_install *application*, and is non-controversil.
For me this is potentially controversial because:
2. Programmer mis-types the name of a package in the dependencies for his own pacakge, e.g.:
setup(install_requires=['Foo']...)
Note that this might be intentional, as opposed to a typo. The programmer will think "Foo" is a valid name because it worked with easy_install. It's true that easy_install prints a warning, but it is buried in so much output that it is easily missed or ignored.
In this case, coddling the error causes it to *propagate*, becuase other programmers will copy it directly, or depend on the error- filled package. Worse, the cost of error correction is transferred to *all* users of the setuptools library, even if they never use 'easy_install' at all.
Well said. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
Noah Gift wrote:
In my humble opinion, I for one completely agree with Phillip. I have had to sit down with quite a few new Python Programmers and show them how to use easy_install and I "thank God" easy_install is smart enough to figure out case sensitivity. This is a wonderful feature!!!! Please don't ever get rid of it :)
If easy_install had instead said "sorry, I can't find 'foo', perhaps you meant 'Foo'", then the user would be both spared frustration and enlightened. -- Benji York http://benjiyork.com
Date: Mon, 23 Jul 2007 15:05:42 -0400 From: Benji York
Noah Gift wrote:
In my humble opinion, I for one completely agree with Phillip. I have had to sit down with quite a few new Python Programmers and show them how to use easy_install and I "thank God" easy_install is smart enough to figure out case sensitivity. This is a wonderful feature!!!! Please don't ever get rid of it :)
If easy_install had instead said "sorry, I can't find 'foo', perhaps you meant 'Foo'", then the user would be both spared frustration and enlightened.
+1 -- Rick Ratzel - Enthought, Inc. 515 Congress Avenue, Suite 2100 - Austin, Texas 78701 512-536-1057 x229 - Fax: 512-536-1059 http://www.enthought.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jul 23, 2007, at 12:01 PM, Tres Seaver wrote:
It's said that a 10% improvement in ease-of-use can double a product's users.
Under that principle, can I renew my plea for a better name than "easy_install"? - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRqTbYHEjvBPtnXfVAQIHmgP+L5eDz3n4mrcPk5K6NEexQPLrOT9iSd+w cFYhn+FL5QoK6snRfxFp25KFmdz/raKDeGpQ4ZIy3nhpZTqxeQpPCsAg84rrw0lQ lflPXkMMmZJTi+3JmjXc2mhj2SlHZ+73XxRPcD2NKnqr14sxlunJMPe4/IX+y1Rf 9C5WVwoCiJ0= =b+zs -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jul 23, 2007, at 1:40 PM, Phillip J. Eby wrote:
At 12:46 PM 7/23/2007 -0400, Barry Warsaw wrote:
Under that principle, can I renew my plea for a better name than "easy_install"?
Do you have a specific name in mind?
I knew you were going to ask that. :) My first requirement is "something without an underscore". My second would be "something that's evocative of snakes and eggs". A couple of Pycons ago I think I suggested 'hatch' which IIRC you didn't like because you wanted to reserve that for some other function, but I've forgotten what that was. OTOH, a quick search didn't reveal any collisions with hatch(1) on *nix. lay(1) is probably not a good choice <wink>. egg(1) is probably the best choice IMO. It doesn't conflict with any other existing *nix command that I can tell, and I can't think of a better command for dealing with Python eggs. It's immediately evocative. Now that I think about it, maybe it was 'egg' that you wanted to reserve. I like 'egg' too because Ruby gems are managed with the gem(1) command so there's a parallel that is easily remembered. I'd further recommend that Python itself come with an 'egg(1)' command, which should be a shortcut for 'python setup.py'. To be honest, the latter is not as user friendly as 'egg sdist bdist_egg upload -s' but if the two use cases conflict too much, then egg(1) should be reserved for the end user because developers can swallow the slight inconvenience more easily. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRqT1vXEjvBPtnXfVAQLjbAP/av1ONvVJLULNQNqrrkXyqlEPFmbNHAqH piS2Db+8CyU2JCEa7cpqrv04S69kaTp6DdPggdIJbbqMNH4WQFUjP15HESky+4qH ExkjbxdKCnxC0U6UO9DLng6NkMwJpeddYkzf+/XlgunCxW2EjatMsqZkYpSH/Zov dBDdKCylT0s= =im3z -----END PGP SIGNATURE-----
egg(1) is probably the best choice IMO. It doesn't conflict with any other existing *nix command that I can tell, and I can't think of a better command for dealing with Python eggs. It's immediately evocative. Now that I think about it, maybe it was 'egg' that you wanted to reserve. I like 'egg' too because Ruby gems are managed with the gem(1) command so there's a parallel that is easily remembered.
I agree. I really like egg! easy_install is a bit of a pain as tab completion with underscores is troublesome. 3 characters is even better..
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Barry Warsaw wrote:
On Jul 23, 2007, at 1:40 PM, Phillip J. Eby wrote:
At 12:46 PM 7/23/2007 -0400, Barry Warsaw wrote:
Under that principle, can I renew my plea for a better name than "easy_install"? Do you have a specific name in mind?
I knew you were going to ask that. :) My first requirement is "something without an underscore".
Why? That has to be a pure "de gustibus" argument AFAICT.
My second would be "something that's evocative of snakes and eggs".
You prefer cute, rather than self-explanatory?
A couple of Pycons ago I think I suggested 'hatch' which IIRC you didn't like because you wanted to reserve that for some other function, but I've forgotten what that was. OTOH, a quick search didn't reveal any collisions with hatch(1) on *nix.
lay(1) is probably not a good choice <wink>.
egg(1) is probably the best choice IMO. It doesn't conflict with any other existing *nix command that I can tell, and I can't think of a better command for dealing with Python eggs. It's immediately evocative. Now that I think about it, maybe it was 'egg' that you wanted to reserve. I like 'egg' too because Ruby gems are managed with the gem(1) command so there's a parallel that is easily remembered.
I'd further recommend that Python itself come with an 'egg(1)' command, which should be a shortcut for 'python setup.py'. To be honest, the latter is not as user friendly as 'egg sdist bdist_egg upload -s' but if the two use cases conflict too much, then egg(1) should be reserved for the end user because developers can swallow the slight inconvenience more easily.
Why would a *command* invoke setup.py? I don't see the point: peopple who want that kind of convenience can write two-liner shell scripts, spelled the way they like, or even put an alias in their shell profile. The point of 'easy_install' is that it has been wired to a *specific* python installation, and knows how to fetch and install distributions into it. It doesn't even make sense to *use* easy_install if you are already in the unpacked source distribution; the intersection of the two sets of use cases is empty. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGpQoD+gerLs4ltQ4RAp1UAJ9xfqZtK62Y3LgDYuhsm3XJHRujgACeO8ta v8/OOznC6xWEbD5KzzHuG6M= =tRSS -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jul 23, 2007, at 4:05 PM, Tres Seaver wrote:
Barry Warsaw wrote:
On Jul 23, 2007, at 1:40 PM, Phillip J. Eby wrote:
At 12:46 PM 7/23/2007 -0400, Barry Warsaw wrote:
Under that principle, can I renew my plea for a better name than "easy_install"? Do you have a specific name in mind?
I knew you were going to ask that. :) My first requirement is "something without an underscore".
Why? That has to be a pure "de gustibus" argument AFAICT.
It's not. Few *nix commands use underscores, and for good reason. It's user unfriendly because it's on an uncommon keycap you have to search for *and* you have to shift to it. And by "you" I don't mean you and me (i.e. programmers) who type the silly things all day, I mean "users". Sometimes hyphens are used (e.g. 'apt-get') which is only slightly better but those are still rare. And this recommendation says (to me) that utility names should be between 2 and 9 characters and be composed of lower case letters and digits: http://www.opengroup.org/onlinepubs/009695399/basedefs/ xbd_chap12.html#tag_12_02
My second would be "something that's evocative of snakes and eggs".
You prefer cute, rather than self-explanatory?
I prefer something memorable and easy to type. BTW, it's only a claim that 'easy_install' is actually 'easy' -- a claim I happen to agree with, mostly, but it's more hopefully self-explanatory than objectively so.
I'd further recommend that Python itself come with an 'egg(1)' command, which should be a shortcut for 'python setup.py'. To be honest, the latter is not as user friendly as 'egg sdist bdist_egg upload -s' but if the two use cases conflict too much, then egg(1) should be reserved for the end user because developers can swallow the slight inconvenience more easily.
Why would a *command* invoke setup.py? I don't see the point: peopple who want that kind of convenience can write two-liner shell scripts, spelled the way they like, or even put an alias in their shell profile.
Of course, and developers who dislike the verbosity are well equipped to do any kind of aliasing they want. I'm not going to belabor this point because it's not the use case I /really/ care about and I don't want to sidetrack the discussion about coming up with a better command than 'easy_install' for users. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRqURj3EjvBPtnXfVAQIFrgP/XbBgGxLLgKyIMCVeyXLsRArL2IrocLKn Uu5lykvaunN5Q6UgqJM9JzOvHldbd2Igs0hugNnGXOWOzEUUJITASgPkcetTmwAs 1eot2oa6ZAbHLo1bgntBPW9HLAK7mZES7/Py8VwSnksubCX03aCu4mR4WnCCfM+H ssFW37aEwDU= =M/XJ -----END PGP SIGNATURE-----
At 04:05 PM 7/23/2007 -0400, Tres Seaver wrote:
It doesn't even make sense to *use* easy_install if you are already in the unpacked source distribution; the intersection of the two sets of use cases is empty.
Not true, actually. "setup.py install" doesn't give you the same flexibility of options as "easy_install .", since the latter lets you specify options like whether or not to unzip the resulting package, and whether to include dependencies or not.
<snip>
A couple of Pycons ago I think I suggested 'hatch' which IIRC you didn't like because you wanted to reserve that for some other function, but I've forgotten what that was. OTOH, a quick search didn't reveal any collisions with hatch(1) on *nix.
+1 on a rename. Both 'hatch' and 'egg' are excellent. I'm not fond of the name 'easy_install', because: - it is very generic - the bias in the name (easy) will backfire whenever it fails for any reason. Like: 'Hey, this *bleep* thing thinks its easy to use, but it doesn't *bleep* work'. - Lars
But by definition, the people typing the names of the dependencies into a 'setup.py' for such a plugin *are* Python programmers, and could be expected to know about case sensitivity.
I don't think Jim was areguing that human-centric *search* should punish misspellings, but rather that encouraging such sloppiness in other packages is a misfeature, especially if supporting it induces a tax on *all* users of automated dependency resolution.
Right. I think Phillip is primarily talking about package names as specified on the command line of easy_install. So if your concern is about package names specified in dependencies, one solution could be that setuptools distinguishes whether to apply case corrections and normalization, depending on whether it was an end-user-typed name or a programmer-specified one. What I don't know is how difficult that would be to implement, and what volunteer is supposed to implement it if it were easy/possible, so I by no means propose that such a solution should be implemented, even if it would solve the problem. Regards, Martin
At 10:04 PM 7/23/2007 +0200, Martin v. Löwis wrote:
But by definition, the people typing the names of the dependencies into a 'setup.py' for such a plugin *are* Python programmers, and could be expected to know about case sensitivity.
I don't think Jim was areguing that human-centric *search* should punish misspellings, but rather that encouraging such sloppiness in other packages is a misfeature, especially if supporting it induces a tax on *all* users of automated dependency resolution.
Right. I think Phillip is primarily talking about package names as specified on the command line of easy_install.
So if your concern is about package names specified in dependencies, one solution could be that setuptools distinguishes whether to apply case corrections and normalization, depending on whether it was an end-user-typed name or a programmer-specified one.
What I don't know is how difficult that would be to implement, and what volunteer is supposed to implement it if it were easy/possible, so I by no means propose that such a solution should be implemented, even if it would solve the problem.
Yes, especially since compatibility with the existing installation base requires case insensitivity, because on case-insensitive platforms easy_install already normalizes the case of filenames it creates. So, the question of what the "right thing" to do is in the abstract has already been moot for a year or two.
Yes, especially since compatibility with the existing installation base requires case insensitivity, because on case-insensitive platforms easy_install already normalizes the case of filenames it creates. So, the question of what the "right thing" to do is in the abstract has already been moot for a year or two.
Can you elaborate a bit, please? Why does the case of filenames matter for the queries it makes? AFAIU, it gets package names either from the user or from setup.py, perhaps also from packages dependency inside .egg files (assuming those support dependencies); these should all be case-sensitive. Regards, Martin
At 11:13 PM 7/23/2007 +0200, Martin v. Löwis wrote:
Yes, especially since compatibility with the existing installation base requires case insensitivity, because on case-insensitive platforms easy_install already normalizes the case of filenames it creates. So, the question of what the "right thing" to do is in the abstract has already been moot for a year or two.
Can you elaborate a bit, please? Why does the case of filenames matter for the queries it makes?
AFAIU, it gets package names either from the user or from setup.py, perhaps also from packages dependency inside .egg files (assuming those support dependencies); these should all be case-sensitive.
In order to resolve dependencies, the system looks at installed .egg files and directories (and .egg-info direcories), and extracts package name and version info from the filenames.
Phillip J. Eby schrieb:
At 11:13 PM 7/23/2007 +0200, Martin v. Löwis wrote:
Yes, especially since compatibility with the existing installation base requires case insensitivity, because on case-insensitive platforms easy_install already normalizes the case of filenames it creates. So, the question of what the "right thing" to do is in the abstract has already been moot for a year or two.
Can you elaborate a bit, please? Why does the case of filenames matter for the queries it makes?
In order to resolve dependencies, the system looks at installed .egg files and directories (and .egg-info direcories), and extracts package name and version info from the filenames.
Still - why does that require case-insensitive lookups to the index? Suppose a package specifies a dependency Foo. IIUC, you look on disk whether foo is already present, finding the version(s) of foo installed in that process. Then, this either is satisfying or not. If it is, you don't need the index at all. If it is not, you need to go to the index - but you still know that it is Foo that you were looking for, no? So lookups for dependencies in the index could always be case-sensitive; please correct me if I'm wrong. Regards, Martin
On Jul 23, 2007, at 5:21 PM, Phillip J. Eby wrote:
At 11:13 PM 7/23/2007 +0200, Martin v. Löwis wrote:
Yes, especially since compatibility with the existing installation base requires case insensitivity, because on case-insensitive platforms easy_install already normalizes the case of filenames it creates. So, the question of what the "right thing" to do is in the abstract has already been moot for a year or two.
Can you elaborate a bit, please? Why does the case of filenames matter for the queries it makes?
AFAIU, it gets package names either from the user or from setup.py, perhaps also from packages dependency inside .egg files (assuming those support dependencies); these should all be case-sensitive.
In order to resolve dependencies, the system looks at installed .egg files and directories (and .egg-info direcories), and extracts package name and version info from the filenames.
But the package name and version are in the PKG-INFO files, so it certainly has access to non-normalized names. Why can't it double check a possible match against that file? Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
At 06:11 AM 7/24/2007 -0400, Jim Fulton wrote:
On Jul 23, 2007, at 5:21 PM, Phillip J. Eby wrote:
At 11:13 PM 7/23/2007 +0200, Martin v. Löwis wrote:
Yes, especially since compatibility with the existing installation base requires case insensitivity, because on case-insensitive platforms easy_install already normalizes the case of filenames it creates. So, the question of what the "right thing" to do is in the abstract has already been moot for a year or two.
Can you elaborate a bit, please? Why does the case of filenames matter for the queries it makes?
AFAIU, it gets package names either from the user or from setup.py, perhaps also from packages dependency inside .egg files (assuming those support dependencies); these should all be case-sensitive.
In order to resolve dependencies, the system looks at installed .egg files and directories (and .egg-info direcories), and extracts package name and version info from the filenames.
But the package name and version are in the PKG-INFO files, so it certainly has access to non-normalized names. Why can't it double check a possible match against that file?
Because if case actually made a difference, we couldn't have both packages installed in the same directory, could we? And why add an extra file open (which currently is only needed for "develop" eggs) to the process of building a working set or environment, in order to confirm something whose only purpose is to make requirements more difficult to specify? :) Note that if what's bothering you is the package index access time, use Apache's mod_speling to enable case-insensitive URLs for the static page tree.
On Jul 24, 2007, at 11:31 AM, Phillip J. Eby wrote:
At 06:11 AM 7/24/2007 -0400, Jim Fulton wrote:
On Jul 23, 2007, at 5:21 PM, Phillip J. Eby wrote:
At 11:13 PM 7/23/2007 +0200, Martin v. Löwis wrote:
Yes, especially since compatibility with the existing installation base requires case insensitivity, because on case-insensitive platforms easy_install already normalizes the case of filenames it creates. So, the question of what the "right thing" to do is in the abstract has already been moot for a year or two.
Can you elaborate a bit, please? Why does the case of filenames matter for the queries it makes?
AFAIU, it gets package names either from the user or from setup.py, perhaps also from packages dependency inside .egg files (assuming those support dependencies); these should all be case-sensitive.
In order to resolve dependencies, the system looks at installed .egg files and directories (and .egg-info direcories), and extracts package name and version info from the filenames.
But the package name and version are in the PKG-INFO files, so it certainly has access to non-normalized names. Why can't it double check a possible match against that file?
Because if case actually made a difference, we couldn't have both packages installed in the same directory, could we? And why add an extra file open (which currently is only needed for "develop" eggs) to the process of building a working set or environment, in order to confirm something whose only purpose is to make requirements more difficult to specify? :)
Currently, we allow packages to differ only in case. The fact that setuptools pretends we don't doesn't change the fact that we do. You said that "compatibility with the existing installation base requires case insensitivity, because on case-insensitive platforms easy_install already normalizes the case of filenames it creates". I'm merely pointing out that we don't have to rely soley on the file name.
Note that if what's bothering you is the package index access time, use Apache's mod_speling to enable case-insensitive URLs for the static page tree.
*If* we decide that package names are case insensitive, then we should do this. We haven't decided this. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
At 11:39 AM 7/24/2007 -0400, Jim Fulton wrote:
On Jul 24, 2007, at 11:31 AM, Phillip J. Eby wrote:
At 06:11 AM 7/24/2007 -0400, Jim Fulton wrote:
On Jul 23, 2007, at 5:21 PM, Phillip J. Eby wrote:
At 11:13 PM 7/23/2007 +0200, Martin v. Löwis wrote:
Yes, especially since compatibility with the existing installation base requires case insensitivity, because on case-insensitive platforms easy_install already normalizes the case of filenames it creates. So, the question of what the "right thing" to do is in the abstract has already been moot for a year or two.
Can you elaborate a bit, please? Why does the case of filenames matter for the queries it makes?
AFAIU, it gets package names either from the user or from setup.py, perhaps also from packages dependency inside .egg files (assuming those support dependencies); these should all be case-sensitive.
In order to resolve dependencies, the system looks at installed .egg files and directories (and .egg-info direcories), and extracts package name and version info from the filenames.
But the package name and version are in the PKG-INFO files, so it certainly has access to non-normalized names. Why can't it double check a possible match against that file?
Because if case actually made a difference, we couldn't have both packages installed in the same directory, could we? And why add an extra file open (which currently is only needed for "develop" eggs) to the process of building a working set or environment, in order to confirm something whose only purpose is to make requirements more difficult to specify? :)
Currently, we allow packages to differ only in case. The fact that setuptools pretends we don't doesn't change the fact that we do.
I wasn't under the impression that we were discussing whether allowing project names to differ only in case was a good idea, since I haven't heard anybody give an argument that it's a *good* idea. In fact, it seems like an obviously bad idea on its face, whether setuptools is in the picture or not.
Note that if what's bothering you is the package index access time, use Apache's mod_speling to enable case-insensitive URLs for the static page tree.
*If* we decide that package names are case insensitive, then we should do this. We haven't decided this.
Well, so far the only argument *against* it that I recall seeing, is your argument that sloppy requirement specs slow everybody down by making them do the extra package index hit. So, if that's fixable, what other argument is there for treating the names case-sensitively?
But the package name and version are in the PKG-INFO files, so it certainly has access to non-normalized names. Why can't it double check a possible match against that file?
Because if case actually made a difference, we couldn't have both packages installed in the same directory, could we?
Right. However, there is a difference between case-insensitive, and case-preserving.
Note that if what's bothering you is the package index access time, use Apache's mod_speling to enable case-insensitive URLs for the static page tree.
That won't help. If you look for a name of a non-registered package, setuptools will go to the index even if mod_speling corrects spelling errors. Such an approach is only possible if setuptools would stop using the entire index if the server has case-insensitive lookup (which it cannot determine). Regards, Martin
At 07:40 PM 7/24/2007 +0200, Martin v. Löwis wrote:
But the package name and version are in the PKG-INFO files, so it certainly has access to non-normalized names. Why can't it double check a possible match against that file?
Because if case actually made a difference, we couldn't have both packages installed in the same directory, could we?
Right. However, there is a difference between case-insensitive, and case-preserving.
I don't understand your statement here, nor what is supposed to follow from it.
Note that if what's bothering you is the package index access time, use Apache's mod_speling to enable case-insensitive URLs for the static page tree.
That won't help. If you look for a name of a non-registered package, setuptools will go to the index even if mod_speling corrects spelling errors.
Jim's objection was that if it's possible to get case-correction from the index, people will declare setup.py dependencies with incorrect case, leading to other packages having indirect dependencies with incorrect case, leading to lots of package index lookups. This objection is relevant only to requirements which differ from the actual project name only by their case. A non-registered package lookup is going to fail no matter what, and thus isn't going to wind up in a setup.py without a dependency_links specifier that will prevent it being looked up in the package index to begin with.
Because if case actually made a difference, we couldn't have both packages installed in the same directory, could we?
Right. However, there is a difference between case-insensitive, and case-preserving.
I don't understand your statement here, nor what is supposed to follow from it.
Clearly, on a case-insensitive file system, project names differing only in case cannot coexist. That doesn't mean that all references to the project should be case-normalized (e.g. lower-cased). So even if project names compare case-insensitive, there still should (could) be a "right" spelling, the one that the package author wants to see. This is the spelling that others then should use. So I still don't see why the file names on disk have any effect on the lookup setuptools do to the index.
Jim's objection was that if it's possible to get case-correction from the index, people will declare setup.py dependencies with incorrect case, leading to other packages having indirect dependencies with incorrect case, leading to lots of package index lookups.
I don't think that was his objection. IIUC, he complains about incorrect spellings as bad, period - regardless of whether they also have a performance effect. It's like spelling your name "Philipp" - that's a bad thing to do, independent of whether it also makes you harder to find (which it actually doesn't, thanks to Google).
This objection is relevant only to requirements which differ from the actual project name only by their case. A non-registered package lookup is going to fail no matter what, and thus isn't going to wind up in a setup.py without a dependency_links specifier that will prevent it being looked up in the package index to begin with.
Right. However, if setuptools would stop making case insensitive lookups to the index, lookups to unregistered packages would become more efficient. Regards, Martin
At 08:21 PM 7/24/2007 +0200, Martin v. Löwis wrote:
Because if case actually made a difference, we couldn't have both packages installed in the same directory, could we?
Right. However, there is a difference between case-insensitive, and case-preserving.
I don't understand your statement here, nor what is supposed to follow from it.
Clearly, on a case-insensitive file system, project names differing only in case cannot coexist. That doesn't mean that all references to the project should be case-normalized (e.g. lower-cased).
So even if project names compare case-insensitive, there still should (could) be a "right" spelling, the one that the package author wants to see. This is the spelling that others then should use.
Well, that spelling will certainly show up everywhere. Setuptools is case-preserving, *except* with regard to installing egg files on case-insensitive filesystems (as defined by what os.path.normcase does on a given platform). When it installs an egg, it normalizes the case of the target path. In all other matters it is case-insensitive for comparison, but case-preserving of the inputs it receives.
Jim's objection was that if it's possible to get case-correction from the index, people will declare setup.py dependencies with incorrect case, leading to other packages having indirect dependencies with incorrect case, leading to lots of package index lookups.
I don't think that was his objection. IIUC, he complains about incorrect spellings as bad, period - regardless of whether they also have a performance effect. It's like spelling your name "Philipp" - that's a bad thing to do, independent of whether it also makes you harder to find (which it actually doesn't, thanks to Google).
It's actually more like spelling my name "phillip", which is arguably still spelled correctly, if punctuated poorly. :) And it's also an answer to the wrong question: the *first* question is whether we should allow "phillip" and "Phillip" to co-exist in the package index. If not, then there is the question of whether there is any reason to be case-sensitive with respect to searching. If we are agreed that having projects whose names differ only by case is a bad idea, then the latter question is considerably less controversial.
This objection is relevant only to requirements which differ from the actual project name only by their case. A non-registered package lookup is going to fail no matter what, and thus isn't going to wind up in a setup.py without a dependency_links specifier that will prevent it being looked up in the package index to begin with.
Right. However, if setuptools would stop making case insensitive lookups to the index, lookups to unregistered packages would become more efficient.
I'm not sure I follow you. If a non-registered package is used as a dependency, the setup() will need to specify dependency_links, in which case PyPI will not be consulted.
Right. However, if setuptools would stop making case insensitive lookups to the index, lookups to unregistered packages would become more efficient.
I'm not sure I follow you. If a non-registered package is used as a dependency, the setup() will need to specify dependency_links, in which case PyPI will not be consulted.
Ah, ok. So is it then correct that setuptools never looks at pypi/, unless the user misspelled a package name on the command line? Regards, Martin
At 08:54 PM 7/24/2007 +0200, Martin v. Löwis wrote:
Right. However, if setuptools would stop making case insensitive lookups to the index, lookups to unregistered packages would become more efficient.
I'm not sure I follow you. If a non-registered package is used as a dependency, the setup() will need to specify dependency_links, in which case PyPI will not be consulted.
Ah, ok. So is it then correct that setuptools never looks at pypi/, unless the user misspelled a package name on the command line?
Pretty much, yes.
On 7/24/07, Phillip J. Eby
At 08:54 PM 7/24/2007 +0200, Martin v. Löwis wrote:
Right. However, if setuptools would stop making case insensitive lookups to the index, lookups to unregistered packages would become more efficient.
I'm not sure I follow you. If a non-registered package is used as a dependency, the setup() will need to specify dependency_links, in which case PyPI will not be consulted.
Ah, ok. So is it then correct that setuptools never looks at pypi/, unless the user misspelled a package name on the command line?
Pretty much, yes.
Would it be a bad idea to suggest the case insensitive lookup happen against a local flat file that gets diff'd from PyPI? Then only the culprit gets punished using their own CPU :)
Would it be a bad idea to suggest the case insensitive lookup happen against a local flat file that gets diff'd from PyPI? Then only the culprit gets punished using their own CPU :)
What does it mean to "diff a flat file from PyPI"? Regards, Martin
Would it be a bad idea to suggest the case insensitive lookup happen against a local flat file that gets diff'd from PyPI? Then only the culprit gets punished using their own CPU :)
What does it mean to "diff a flat file from PyPI"? I am familiar with an open source project called Radmind. It
On 7/24/07, "Martin v. Löwis"
Regards, Martin
I am familiar with an open source project called Radmind. It maintains machines be keeping a local transcript with all of the files and "overloads" on it. When you modify the file system you diff the changes into an overload and put them on the server.
That's still a lot of terminology which I don't understand, and have no intuition for, perhaps because English is not my native language. I give up trying to understand - just to give you an idea: What's a "transcript of files"? How do you "overload" on it (why is "to overload" used with the preposition "on")? How do I "diff" a change "into" "an overload" (which now is a noun, it seems)?
So, if someone does an "incorrect" search, easy_install checks to see first if it has the latest "file". If not, it then replaces its local index. Then the search happens locally, not being going back and forth to the server.
I think this brings us to the real issue: you asked whether this would be a bad idea to suggest that? I now think "perhaps not bad, but unhelpful, unless you also contribute an implementation of it". It's a change to setuptools, which is still mostly a one-man-show, (IIUC), so proposing ideas in general is futile (as for most software with a single author - including PyPI); the single author cannot possibly implement all the ideas people have. Regards, Martin
My real motive is selfishness. I like that easy_install in not case sensitive, as I and other people I am helping to learn Python. I just hope that doesn't go away. My suggestion is mored geared toward, how do I "keep" that feature :)
no intuition for, perhaps because English is not my native language. I give up trying to understand - just to give you an idea:
I apologize, I can be very lazy when I type.
I now think "perhaps not bad, but unhelpful, unless you also contribute an implementation of it". It's a change to setuptools, which is still mostly a one-man-show, (IIUC), so proposing ideas in general is futile (as for most software with a single author - including PyPI); the single author cannot possibly implement all the ideas people have.
The basic algorithm is that a local index of PyPi could be kept in one file. If an incorrect search was made, the first action to occur would be to check if the local file was the same as the file on the server. If not, it would sync the changes with svn. Then easy_install would try to do lookups against the local file to find a match. I am happy to help if you need help. I am particular interest in easy_install as I am writing a chapter on it for an O'Reilly book as well, again a partially selfish motive :) Noah
Regards, Martin
Jim Fulton wrote:
On Jul 22, 2007, at 12:51 PM, Phillip J. Eby wrote:
A significant part of setuptools' audience includes people who are new to Python, or at least new to installing or distributing Python modules, and quite a lot of setuptools features are aimed squarely at that audience. This happens to be one of them.
I don't think that encouraging use of case insensitive names by people who are about start learning a language that uses case sensitive names is doing them any favors.
Agreed. -- Benji York http://benjiyork.com
At 07:00 PM 7/21/2007 +0200, Martin v. Löwis wrote:
I've created and experimental prototype setuptools-specific package index at
I've now added something similar as
It's very fast, thanks.
It differs from your site in a few ways:
- it does include a top-level index of all packages (but neither releases nor descriptions)
Unfortunately, that doesn't help current versions of setuptools. See point #7 of: http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api Setuptools looks for release links, not package links on that page. Compare: $ easy_install -vvvi http://cheeseshop.python.org/simple Pywin32 Searching for Pywin32 Reading http://cheeseshop.python.org/simple/Pywin32/ Couldn't find index page for 'Pywin32' (maybe misspelled?) Scanning index of all packages (this may take a while) Reading http://cheeseshop.python.org/simple/ No local packages or download links found for Pywin32 error: Could not find suitable distribution for Requirement.parse('Pywin32') $ easy_install -vvvi http://cheeseshop.python.org/pypi Pywin32 Searching for Pywin32 Reading http://cheeseshop.python.org/pypi/Pywin32/ Couldn't find index page for 'Pywin32' (maybe misspelled?) Scanning index of all packages (this may take a while) Reading http://cheeseshop.python.org/pypi/ Reading http://cheeseshop.python.org/pypi/pywin32/210 Reading http://sf.net/projects/pywin32 ...
- it's always current, due to being dynamically computed - it may differ in the precise list of URLs displayed; if there are important deviations, please let me know.
Jim's already mentioned these, but the rel="" info (per the index API spec's point #6), and the links embedded in the long_description field (per point #4) are missing. Without these, easy_install can't find sourceforge links, subversion checkouts, or any other embedded direct download links. For example: $ easy_install -vvvi http://cheeseshop.python.org/simple pywin32 Searching for pywin32 Reading http://cheeseshop.python.org/simple/pywin32/ No local packages or download links found for pywin32 error: Could not find suitable distribution for Requirement.parse('pywin32') $ easy_install -vvvi http://cheeseshop.python.org/pypi pywin32 Searching for pywin32 Reading http://cheeseshop.python.org/pypi/pywin32/ Reading http://sf.net/projects/pywin32 Reading http://sourceforge.net/project/showfiles.php?group_id=78018 Found link: http://downloads.sourceforge.net/pywin32/pywin32-210.win32-py2.2.exe?modtime... ...[a dozen more links] $ easy_install -i http://cheeseshop.python.org/simple setuptools==dev Searching for setuptools==dev Reading http://cheeseshop.python.org/simple/setuptools/ No local packages or download links found for setuptools==dev error: Could not find suitable distribution for Requirement.parse('setuptools==dev') $ easy_install -i http://cheeseshop.python.org/pypi setuptools==dev Searching for setuptools==dev Reading http://cheeseshop.python.org/pypi/setuptools/ Reading http://cheeseshop.python.org/pypi/setuptools Reading http://cheeseshop.python.org/pypi/setuptools/0.6c6 Best match: setuptools dev Downloading http://svn.python.org/projects/sandbox/trunk/setuptools/#egg=setuptools-dev Doing subversion checkout from http://svn.python.org/projects/sandbox/trunk/setuptools/ to ...
Unfortunately, that doesn't help current versions of setuptools. See point #7 of:
http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
Setuptools looks for release links, not package links on that page.
I don't understand. What's a "release link"? The links on the index page *do* go to the "project's active version pages", as specified (there aren't any numbered version pages) Jim left out that page entirely - are you saying it is impossible to provide such an index page with the page structure that Jim proposed?
$ easy_install -vvvi http://cheeseshop.python.org/simple Pywin32 Searching for Pywin32 Reading http://cheeseshop.python.org/simple/Pywin32/ Couldn't find index page for 'Pywin32' (maybe misspelled?) Scanning index of all packages (this may take a while) Reading http://cheeseshop.python.org/simple/ No local packages or download links found for Pywin32
I see that it doesn't work, but I cannot understand why. On http://cheeseshop.python.org/simple/ "pywin32" is clearly linked, so it should be able to resolve the misspelling.
Jim's already mentioned these, but the rel="" info (per the index API spec's point #6),
This is fixed.
and the links embedded in the long_description field (per point #4) are missing.
I have to think about this more. Is it correct that you want all href attributes of all a elements in the long_description? And how do you know what the long_description is from just looking at the rendered page? Regards, Martin
At 09:23 PM 7/21/2007 +0200, Martin v. Löwis wrote:
Unfortunately, that doesn't help current versions of setuptools. See point #7 of:
http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
Setuptools looks for release links, not package links on that page.
I don't understand. What's a "release link"? The links on the index page *do* go to the "project's active version pages", as specified (there aren't any numbered version pages)
See point #2: """2. Individual project version pages' URLs must be of the form base/projectname/version, where base is the package index's base URL.""" That's what's meant by "version pages" in point #7 -- i.e., they *must* be of that two-part form for setuptools to recognize them as such.
I see that it doesn't work, but I cannot understand why. On
http://cheeseshop.python.org/simple/
"pywin32" is clearly linked, so it should be able to resolve the misspelling.
It could perhaps be *changed* to do so, but at present it follows the spec's definition of "version page" URLs.
Jim's already mentioned these, but the rel="" info (per the index API spec's point #6),
This is fixed.
Great; Sourceforge and other offsite download pages work now.
and the links embedded in the long_description field (per point #4) are missing.
I have to think about this more. Is it correct that you want all href attributes of all a elements in the long_description?
Yes; of course, the usual rendering needs to be applied, since long_description can contain reStructuredText.
And how do you know what the long_description is from just looking at the rendered page?
You don't need to; easy_install discovers those links the same way it does any other Cheeseshop-provided download links. From easy_install's point of view, the entire page is just one big mass of links that might point to downloads: """4. ...It is explicitly permitted for a project's "long_description" to include URLs, and these should be formatted as HTML links by the package index, as EasyInstall does *no special processing* [emph. added] to identify what parts of a page are index-specific and which are part of the project's supplied description.""" In other words, the *only* links that are specially handled are the "rel" ones, which it follows unconditionally to look for additional direct download links. All other links are merely *inspected* to see if they obviously refer to a downloadable package (e.g. .tgz, .zip, .egg, .exe etc., or explicitly-marked #egg). As a side-effect, this means that links to perform Cheeseshop operations, links to other parts of python.org, etc. are simply ignored, as they are not links to downloadables nor marked as #egg. If a URL can be determined by inspection to be a download link, then easy_install extracts version and platform info from the URL and adds it as a candidate for download selection. When both the home page and download URL have been read, along with any detected "active version pages" (as defined above), then easy_install chooses the "best" download URL from all the candidates it has seen up to that point.
See point #2:
"""2. Individual project version pages' URLs must be of the form base/projectname/version, where base is the package index's base URL."""
That's what's meant by "version pages" in point #7 -- i.e., they *must* be of that two-part form for setuptools to recognize them as such.
Ok, but I still cannot see how to fix that: there simply *is* no version part that I could point to. Does that mean that Jim's approach does not work?
Yes; of course, the usual rendering needs to be applied, since long_description can contain reStructuredText.
Ok, I now added these links as well. Regards, Martin
At 12:53 AM 7/22/2007 +0200, Martin v. Löwis wrote:
See point #2:
"""2. Individual project version pages' URLs must be of the form base/projectname/version, where base is the package index's base URL."""
That's what's meant by "version pages" in point #7 -- i.e., they *must* be of that two-part form for setuptools to recognize them as such.
Ok, but I still cannot see how to fix that: there simply *is* no version part that I could point to.
Actually, 'version' is allowed to be an empty string, so simply adding a trailing '/' to the links you're generating now should work. The only thing the version part of a version page URL is used for, is to handle links to .py files: setuptools uses the package version (if available) to synthesize a setup.py for installing standalone .py files. If the version is not available, it won't be able to do that, but that's a relatively minor feature, all things considered. Few packages are distributed via a single .py download URL, but the package index could actually tack on an #egg designator to such links in order to preserve 100% backward-compatibility.
Does that mean that Jim's approach does not work?
Jim isn't providing the top-level index, and thus doesn't provide punctuation or case corrections. The "version pages" convention is only used by setuptools to discover additional index pages for crawling, anyway, and his whole design is intended to prevent crawling.
Yes; of course, the usual rendering needs to be applied, since long_description can contain reStructuredText.
Ok, I now added these links as well.
Looks good, thanks!
On Jul 21, 2007, at 7:20 PM, Phillip J. Eby wrote: ...
Jim isn't providing the top-level index, and thus doesn't provide punctuation or case corrections.
Yup
The "version pages" convention is only used by setuptools to discover additional index pages for crawling, anyway, and his whole design is intended to prevent crawling.
That's a secondary benefit. The main goal is to avoid the expense of that page for packages that aren't in PyPI, as some packages I use aren't. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
That's a secondary benefit. The main goal is to avoid the expense of that page for packages that aren't in PyPI, as some packages I use aren't.
I see. Shouldn't that be fixed by providing an option to setuptools that avoids going to the index for missing packages? Regards, Martin
At 06:26 PM 7/22/2007 +0200, Martin v. Löwis wrote:
That's a secondary benefit. The main goal is to avoid the expense of that page for packages that aren't in PyPI, as some packages I use aren't.
I see. Shouldn't that be fixed by providing an option to setuptools that avoids going to the index for missing packages?
There's already such an option; --find-links or -f lets you specify URLs that should be checked before *any* PyPI access occurs. If all dependencies can be met using those URLs without going to PyPI, and you haven't explicitly requested -U (--update), easy_install doesn't go to PyPI. You can also specify such links in a setup script using setup(dependency_links=[...]), which bakes them into the .egg. When searching for that egg's dependencies, easy_install will pick them up and use them. So, it's actually possible to install a package and all its dependencies without using PyPI at all, if the package author(s) bake the URLs in.
Martin v. Löwis schrieb:
I've created and experimental prototype setuptools-specific package index at
I've now added something similar as
http://cheeseshop.python.org/simple/
It differs from your site in a few ways:
- it does include a top-level index of all packages (but neither releases nor descriptions) - it's always current, due to being dynamically computed - it may differ in the precise list of URLs displayed; if there are important deviations, please let me know.
What I, as an outsider, can see: for the Pygments package, Jim's page lists the development link from the package description (http://trac.pocoo.org/repos/pygments/trunk#egg=Pygments-dev), but it looks like it's badly extracted (it has a trailing ">`__"), yours doesn't list it at all. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
At 09:23 PM 7/21/2007 +0200, Georg Brandl wrote:
What I, as an outsider, can see: for the Pygments package, Jim's page lists the development link from the package description (http://trac.pocoo.org/repos/pygments/trunk#egg=Pygments-dev), but it looks like it's badly extracted (it has a trailing ">`__"), yours doesn't list it at all.
Hm, perhaps Jim is extracting it by looking for #egg URLs, rather than by actually processing the reST markup with docutils. That should probably be fixed, since there are many ways to specify URLs in reST and handling them all with regular expressions is unlikely to work as well as applying regular expressions to the resulting HTML. :) (Also, looking only for #egg links will miss non-#egg links embedded in the long_description, in the event that someone places direct download links there.)
On Jul 21, 2007, at 3:53 PM, Phillip J. Eby wrote:
At 09:23 PM 7/21/2007 +0200, Georg Brandl wrote:
What I, as an outsider, can see: for the Pygments package, Jim's page lists the development link from the package description (http://trac.pocoo.org/repos/pygments/trunk#egg=Pygments-dev), but it looks like it's badly extracted (it has a trailing ">`__"), yours doesn't list it at all.
Hm, perhaps Jim is extracting it by looking for #egg URLs, rather than by actually processing the reST markup with docutils.
Yup.
That should probably be fixed, since there are many ways to specify URLs in reST and handling them all with regular expressions is unlikely to work
Yeah, I was hoping to get off easy. :)
as well as applying regular expressions to the resulting HTML. :)
:)
(Also, looking only for #egg links will miss non-#egg links embedded in the long_description, in the event that someone places direct download links there.)
By this, I assume you mean direct links to distributions. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
WRT zc.buildout, refreshing a buildout with just ZODB installed in it takes about 45 seconds for me using PyPI and about 5 seconds using the experimental index.
Can you kindly provide a measurement for the index at http://cheeseshop.python.org/simple/ as well? Thanks, Martin
On Jul 22, 2007, at 2:40 PM, Martin v. Löwis wrote:
WRT zc.buildout, refreshing a buildout with just ZODB installed in it takes about 45 seconds for me using PyPI and about 5 seconds using the experimental index.
Can you kindly provide a measurement for the index at http://cheeseshop.python.org/simple/ as well?
Yup. So, ATM: Using old PyPI takes about 1m5s Using simple takes about 25s Using ppix takes about 8s Some notes: - ZODB isn't the best example as it has download links to www.zope.org, making it take longer than packages without offsite links (relative to PyPI). - I expect that the difference between simple and ppix *for me* is a matter of geography. Refreshing an empty buildout checks the zc.buildout and setuptools packages. For that: Old PyPI takes 25s Simple takes 8s and ppix takes .5s Again, I assume that the difference between simple and ppix has more to do with geography than the difference between serving statically and dynamically. The simple page has more links on it than the ppix page, because I haven't gotten around to scarf all links off of a restructured-text rendering of long description. I doubt that makes any difference. It will be interesting to try again after I fix that. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
Yup. So, ATM:
Using old PyPI takes about 1m5s Using simple takes about 25s Using ppix takes about 8s
Thanks!
Again, I assume that the difference between simple and ppix has more to do with geography than the difference between serving statically and dynamically. The simple page has more links on it than the ppix page, because I haven't gotten around to scarf all links off of a restructured-text rendering of long description. I doubt that makes any difference. It will be interesting to try again after I fix that.
If you think that the /simple pages are correct, it might be easier to just mirror them instead of doing all the work yourself. I don't plan to take that service offline, unless experimentation shows it has serious flaws. Regards, Martin
On Jul 23, 2007, at 4:00 PM, Martin v. Löwis wrote: ...
Again, I assume that the difference between simple and ppix has more to do with geography than the difference between serving statically and dynamically. The simple page has more links on it than the ppix page, because I haven't gotten around to scarf all links off of a restructured-text rendering of long description. I doubt that makes any difference. It will be interesting to try again after I fix that.
If you think that the /simple pages are correct, it might be easier to just mirror them instead of doing all the work yourself.
Good point. I might just do that.
I don't plan to take that service offline, unless experimentation shows it has serious flaws.
Cool. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
participants (15)
-
"Martin v. Löwis"
-
Barry Warsaw
-
Benji York
-
Christian Theune
-
Fred Drake
-
Georg Brandl
-
Jim Fulton
-
Jodok Batlogg
-
Lars Immisch
-
Michael Foord
-
Noah Gift
-
Phillip J. Eby
-
Rick Ratzel
-
Sean Reifschneider
-
Tres Seaver