Re: [Distutils] Mystery solved

At 08:38 AM 6/24/2006 -0400, Jim Fulton wrote:
Suppose you have some distributions of interest in a directory somewhere. You ask easy_install to update your packages. You use find-links to tell it to look in the directory of distributions. By default, it looks for an index at http://www.python.org/pypi. Further assume that the distribution you are updating is not in PyPI. You'll get the following messages:
Couldn't find index page for 'jimtest' (maybe misspelled?) Scanning index of all packages (this may take a while)
And easy install will have made 2 connections, to: http://www.python.org/pypi/jimtest/, and to http://www.python.org/ pypi/.
You should be able to easily reproduce this.
Now, suppose, instead, you provide an index (-i) option telling easy_install to look somewhere else? Say, http://www.python.org. You don't get the messages and only a single connection is made.Why? What's the difference you ask? Well, PyPI (and my test server) returns a content type of text/plain for 404 results, while most web servers return text/html. I changed my test server to return HTML Non-Found results, and the messages went away and I only get a single connection.
Ah, the irony. Your change, and the recent change of PyPI to do the same thing, are actually forcing *erroneous* behavior now, by suppressing the fallback search. :)
So what is the correct behavior? I dunno. The warnings are rather annoying, so I'm glad I've found out how to make them go away.
They'll come back when I release 0.6b4, since it's setuptools' behavior that's broken here. If a response was text/plain, it was doing the right thing, but it wasn't checking text/html responses to see if they were 404 status.
However, I'm pretty sure your intent was to search both / project_name/ and /. When a server returns an HTML not-found error, easy_install skips the / check.
I think there are 2 bugs here:
1. The scan of the all-package index is skipped if the content-type of the if the /project_name/ request is HTML.
Yep, that's a bug, and I'm fixing it now.
2. The messages:
Couldn't find index page for 'jimtest' (maybe misspelled?) Scanning index of all packages (this may take a while)
should be info messages, not warning messages. If they remain warnings, and you fix the first problem, it will be impossible to avoid the warnings without creating a PyPI project or creating an index server, and I don't think it was your intent to require either of these. I don't think a warning should be issues for correct use of software.
Here's the problem. Reducing everything to info messages means there's effectively no control over output detail. I generally use 'warn()' for things that *may* reflect an error in input parameters. So, my take on the above is that although the "Scanning" message could become an info(), the previous one shouldn't.

On Jul 10, 2006, at 5:50 PM, Phillip J. Eby wrote: ...
2. The messages:
Couldn't find index page for 'jimtest' (maybe misspelled?) Scanning index of all packages (this may take a while)
should be info messages, not warning messages. If they remain warnings, and you fix the first problem, it will be impossible to avoid the warnings without creating a PyPI project or creating an index server, and I don't think it was your intent to require either of these. I don't think a warning should be issues for correct use of software.
Here's the problem. Reducing everything to info messages means there's effectively no control over output detail. I generally use 'warn()' for things that *may* reflect an error in input parameters. So, my take on the above is that although the "Scanning" message could become an info(), the previous one shouldn't.
This means that one always has to use an index. In which case, what is the point of find-links? Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org

At 05:58 PM 7/10/2006 -0400, Jim Fulton wrote:
On Jul 10, 2006, at 5:50 PM, Phillip J. Eby wrote: ...
2. The messages:
Couldn't find index page for 'jimtest' (maybe misspelled?) Scanning index of all packages (this may take a while)
should be info messages, not warning messages. If they remain warnings, and you fix the first problem, it will be impossible to avoid the warnings without creating a PyPI project or creating an index server, and I don't think it was your intent to require either of these. I don't think a warning should be issues for correct use of software.
Here's the problem. Reducing everything to info messages means there's effectively no control over output detail. I generally use 'warn()' for things that *may* reflect an error in input parameters. So, my take on the above is that although the "Scanning" message could become an info(), the previous one shouldn't.
This means that one always has to use an index. In which case, what is the point of find-links?
The point of --find-links is to provide links to unindexed packages. In 0.6b4, you will be able to use 'file:' URLs for a package index, by the way.

On Jul 10, 2006, at 6:22 PM, Phillip J. Eby wrote:
At 05:58 PM 7/10/2006 -0400, Jim Fulton wrote:
On Jul 10, 2006, at 5:50 PM, Phillip J. Eby wrote: ...
2. The messages:
Couldn't find index page for 'jimtest' (maybe misspelled?) Scanning index of all packages (this may take a while)
should be info messages, not warning messages. If they remain warnings, and you fix the first problem, it will be impossible to avoid the warnings without creating a PyPI project or creating an index server, and I don't think it was your intent to require either of these. I don't think a warning should be issues for correct use of software.
Here's the problem. Reducing everything to info messages means there's effectively no control over output detail. I generally use 'warn()' for things that *may* reflect an error in input parameters. So, my take on the above is that although the "Scanning" message could become an info(), the previous one shouldn't.
This means that one always has to use an index. In which case, what is the point of find-links?
The point of --find-links is to provide links to unindexed packages.
But if you use an unindexed package, you'll get a warning. IMO, you should not get a warning for correct use of software. Users should try to make warnings go away. If you give people warnings that they shouldn't make go away, then they are more likely to ignore warnings that you don't want them to. Either you can't have valid unindexed software, or setuptools shouldn't generate a warning if software isn't in the index. I really find the distinction between indexes and find-links rather puzzling. Personally, I'd like to find a way to merge these two concepts into one by choosing a definition of an index that admits a directory full of distributions. Then we could get rid of the find-links concept and allow 0 or more indexes to be used. This would be much simpler and cleaner IMO. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org

At 03:54 AM 7/11/2006 -0400, Jim Fulton wrote:
On Jul 10, 2006, at 6:22 PM, Phillip J. Eby wrote:
Here's the problem. Reducing everything to info messages means there's effectively no control over output detail. I generally use 'warn()' for things that *may* reflect an error in input parameters. So, my take on the above is that although the "Scanning" message could become an info(), the previous one shouldn't.
This means that one always has to use an index. In which case, what is the point of find-links?
The point of --find-links is to provide links to unindexed packages.
But if you use an unindexed package, you'll get a warning.
Which means it *may* be an error. See above. If it was an error, it would be an error. The point of a warning is to inform you that something *may* be an error.
IMO, you should not get a warning for correct use of software. Users should try to make warnings go away.
Repeating it doesn't make it so. I'm not convinced that this particular warning (that you may have misspelled the package name because it's not in the index you're using) is in any way harmful.
If you give people warnings that they shouldn't make go away,
Huh? They can put the package in the index, use a different index, not use -U, specify an exact download URL (either directly or via --find-links), etc. There are a huge number of ways *not* to encounter that particular warning.
then they are more likely to ignore warnings that you don't want them to.
What warnings? There are only about 13 left. :)
Either you can't have valid unindexed software, or setuptools shouldn't generate a warning if software isn't in the index.
This is back to argument by assertion. Please explain to me why this warning is actually bad, rather than simply asserting that it's so.
I really find the distinction between indexes and find-links rather puzzling.
--find-links is used to allow you to point easy_install to a project's non-indexed home page or download page to find links, or to provide other easy_install-processable links, without needing an index.
Personally, I'd like to find a way to merge these two concepts into one by choosing a definition of an index that admits a directory full of distributions.
Feel free to try to come up with one. However, --find-links allows *multiple* links to be specified, and it is also the basis for the "dependency_links" argument to setup(). --find-links is also a primitive upon which the index facility is built, since index pages are treated more-or-less like --find-links URLs that are automatically generated. At a minimum, merging the concepts would mean allowing multiple index URLs, or else eliminating the idea of an index, and treating all --find-links URLs as though they were the base URL of a package index. If you did that, however, it brings in the question of which of the --find-links URLs should be checked for a /projectname/ subdirectory. All of them? Just the first one that finds a result? None of them, if some other criterion is met? Currently, a remote package search means that all --find-links pages are checked, and the --index-url is searched (by going to "index_url/projectname/").

On Jul 11, 2006, at 11:25 AM, Phillip J. Eby wrote:
At 03:54 AM 7/11/2006 -0400, Jim Fulton wrote:
On Jul 10, 2006, at 6:22 PM, Phillip J. Eby wrote:
Here's the problem. Reducing everything to info messages means there's effectively no control over output detail. I generally use 'warn()' for things that *may* reflect an error in input parameters. So, my take on the above is that although the "Scanning" message could become an info(), the previous one shouldn't.
This means that one always has to use an index. In which case, what is the point of find-links?
The point of --find-links is to provide links to unindexed packages.
But if you use an unindexed package, you'll get a warning.
Which means it *may* be an error. See above. If it was an error, it would be an error. The point of a warning is to inform you that something *may* be an error.
IMO, you should not get a warning for correct use of software. Users should try to make warnings go away.
Repeating it doesn't make it so. I'm not convinced that this particular warning (that you may have misspelled the package name because it's not in the index you're using) is in any way harmful.
It is definitely so. That is definitely my opinion. :) OK that's an interesting point wrt possible misspellings. If you can find the package via the find links, but not via the index, that seems to me to be a pretty good indication that this is not a misspelling. This is the case I'm worried about. If the package can't be found anywhere, then I agree that a warning is warranted.
If you give people warnings that they shouldn't make go away,
That wasn't clear. If people are using the software correctly, but choosing to find distributions via find-links rather than an index, and they get warnings, then they will always get warnings and tend to ignore them.
Huh? They can put the package in the index, use a different index, not use -U, specify an exact download URL (either directly or via -- find-links), etc. There are a huge number of ways *not* to encounter that particular warning.
I have to use -U to get newer versions of distributions, even if I happen to store distributions in a directory that is not a valid index. In this case, I use find-links and -U together, and I'll get a warning unless I put distributions in an index.
Either you can't have valid unindexed software, or setuptools shouldn't generate a warning if software isn't in the index.
This is back to argument by assertion. Please explain to me why this warning is actually bad, rather than simply asserting that it's so.
I assert and take as a premise that when users are using software correctly, including not misspelling anything, they should not get a warning. If you can't buy that, then we have an unreconcilable difference. The specific case, which I'll repeat from above, as clearly as I can, is this: - A user chooses not to store their software in an index. - The user places distributions on a web server somewhere. This is just a directory, it is not a valid index. - The user points at their server using find-links - The user has an installation and they want to check for newer versions. - The distributions that they are looking for newer versions of can be found on the server that they name via find-links. In this case, they will get a warning that the distribution they are looking for couldn't be found on the index. They didn't misspell anything, as setuptools should be able to deduce from the fact that their distribution was found on the link server. I don't think that they should get a warning. As far as I'm concerned, this means that distributions must always be stored on index servers and the find-links is just an attractive nuisance.
I really find the distinction between indexes and find-links rather puzzling.
--find-links is used to allow you to point easy_install to a project's non-indexed home page or download page to find links, or to provide other easy_install-processable links, without needing an index.
But they are unusable without getting warnings whenever you want to check for updates.
Personally, I'd like to find a way to merge these two concepts into one by choosing a definition of an index that admits a directory full of distributions.
Feel free to try to come up with one. However, --find-links allows *multiple* links to be specified, and it is also the basis for the "dependency_links" argument to setup(). --find-links is also a primitive upon which the index facility is built, since index pages are treated more-or-less like --find-links URLs that are automatically generated.
I don't need to, you already did....
At a minimum, merging the concepts would mean allowing multiple index URLs, or else eliminating the idea of an index,
Yup. Sound good to me.
and treating all --find-links URLs as though they were the base URL of a package index.
Yes
If you did that, however, it brings in the question of which of the --find-links URLs should be checked for a /projectname/ subdirectory. All of them? Just the first one that finds a result? None of them, if some other criterion is met?
I would stop when a result is found.
Currently, a remote package search means that all --find-links pages are checked, and the --index-url is searched (by going to "index_url/projectname/").
What is the use case for spreading distributions over multiple servers? Do people really want to do that? I can see providing multiple places to look, because different distributions might be on different servers, but I don't see why distributions for a single project should be spread over multiple servers. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org

At 11:50 AM 7/11/2006 -0400, Jim Fulton wrote:
OK that's an interesting point wrt possible misspellings. If you can find the package via the find links, but not via the index, that seems to me to be a pretty good indication that this is not a misspelling. This is the case I'm worried about. If the package can't be found anywhere, then I agree that a warning is warranted.
The interesting question there is, should the fallback scan still take place in the absence of the warning? If it *does* take place, then the reason for the scan (and delay) is unexplained. If it does *not* take place, then there is an undesirable change in semantics. Currently, if you have a package called "Bob's Incredible Package", this will be treated by easy_install as being spelled "Bob-s-Incredible-Package", and it will require a top-level index scan to find the right URL. It is also possible to have --find-links pages containing obsolete versions, while PyPI contains the latest version, so removing the scan doesn't seem to be a reasonable option. So, I will simply change the message to an "info" message stating that the index page couldn't be found (rather than a warning suggesting misspelling), *if* easy_install has previously seen at least one valid distribution file or link for the applicable project name.
The specific case, which I'll repeat from above, as clearly as I can, is this:
- A user chooses not to store their software in an index. - The user places distributions on a web server somewhere. This is just a directory, it is not a valid index. - The user points at their server using find-links - The user has an installation and they want to check for newer versions. - The distributions that they are looking for newer versions of can be found on the server that they name via find-links.
In this case, they will get a warning that the distribution they are looking for couldn't be found on the index.
Okay, this scenario is fixed by changing to an info message as described above.
Personally, I'd like to find a way to merge these two concepts into one by choosing a definition of an index that admits a directory full of distributions.
Feel free to try to come up with one. However, --find-links allows *multiple* links to be specified, and it is also the basis for the "dependency_links" argument to setup(). --find-links is also a primitive upon which the index facility is built, since index pages are treated more-or-less like --find-links URLs that are automatically generated.
I don't need to, you already did....
No, I presented a straw man to show why it doesn't work. I guess I should've been more explicit in spelling out all the undesirable consequences.
If you did that, however, it brings in the question of which of the --find-links URLs should be checked for a /projectname/ subdirectory. All of them? Just the first one that finds a result? None of them, if some other criterion is met?
I would stop when a result is found.
Even so, this means O(N x M) web hits, where N is the number of packages and M is the number of --find-links (including dependency links supplied by eggs installed so far). I don't think it's reasonable to hit so many non-existent URLs on non-index servers, and is impolite to the servers' operators. (For example, if they receive a daily report of all 404 errors from their web servers, as I do. This is pretty common on Red Hat boxes using logwatch, for example.) It's particularly unfair since using e.g. http://peak.telecommunity.com/snapshots/ as a --find-links while installing, say TurboGears, would cause a whole host of "index" hits to subdirectories of that URL, even though none of them can or will be found. The fallout from this approach is far worse than any "screen scraping" issues we've had.
What is the use case for spreading distributions over multiple servers? Do people really want to do that? I can see providing multiple places to look, because different distributions might be on different servers, but I don't see why distributions for a single project should be spread over multiple servers.
Platform-specific distributions may be provided by contributors to a project, rather than by the project's author; see, for example, Bob Ippolito's pages for distributing Mac OS X builds of popular Python packages. For this reason, you may have certain pages that you always want included in your --find-links, to be checked in addition to the normal indexes.

On Jul 11, 2006, at 2:07 PM, Phillip J. Eby wrote:
At 11:50 AM 7/11/2006 -0400, Jim Fulton wrote:
OK that's an interesting point wrt possible misspellings. If you can find the package via the find links, but not via the index, that seems to me to be a pretty good indication that this is not a misspelling. This is the case I'm worried about. If the package can't be found anywhere, then I agree that a warning is warranted.
The interesting question there is, should the fallback scan still take place in the absence of the warning? If it *does* take place, then the reason for the scan (and delay) is unexplained. If it does *not* take place, then there is an undesirable change in semantics.
Currently, if you have a package called "Bob's Incredible Package", this will be treated by easy_install as being spelled "Bob-s- Incredible-Package", and it will require a top-level index scan to find the right URL. It is also possible to have --find-links pages containing obsolete versions, while PyPI contains the latest version, so removing the scan doesn't seem to be a reasonable option.
So, I will simply change the message to an "info" message stating that the index page couldn't be found (rather than a warning suggesting misspelling), *if* easy_install has previously seen at least one valid distribution file or link for the applicable project name.
Great!
The specific case, which I'll repeat from above, as clearly as I can, is this:
- A user chooses not to store their software in an index. - The user places distributions on a web server somewhere. This is just a directory, it is not a valid index. - The user points at their server using find-links - The user has an installation and they want to check for newer versions. - The distributions that they are looking for newer versions of can be found on the server that they name via find-links.
In this case, they will get a warning that the distribution they are looking for couldn't be found on the index.
Okay, this scenario is fixed by changing to an info message as described above.
Yup. Cool.
If you did that, however, it brings in the question of which of the --find-links URLs should be checked for a /projectname/ subdirectory. All of them? Just the first one that finds a result? None of them, if some other criterion is met?
I would stop when a result is found.
Even so, this means O(N x M) web hits, where N is the number of packages and M is the number of --find-links (including dependency links supplied by eggs installed so far). I don't think it's reasonable to hit so many non-existent URLs on non-index servers, and is impolite to the servers' operators. (For example, if they receive a daily report of all 404 errors from their web servers, as I do. This is pretty common on Red Hat boxes using logwatch, for example.)
It's particularly unfair since using e.g. http:// peak.telecommunity.com/snapshots/ as a --find-links while installing, say TurboGears, would cause a whole host of "index" hits to subdirectories of that URL, even though none of them can or will be found.
The fallout from this approach is far worse than any "screen scraping" issues we've had.
Isn't this the approach that's followed now? Aren't all of the find- links searched as well as the index? I suppose you're referring to the search for /projectname, which potentially doubles the number of requests.
What is the use case for spreading distributions over multiple servers? Do people really want to do that? I can see providing multiple places to look, because different distributions might be on different servers, but I don't see why distributions for a single project should be spread over multiple servers.
Platform-specific distributions may be provided by contributors to a project, rather than by the project's author; see, for example, Bob Ippolito's pages for distributing Mac OS X builds of popular Python packages. For this reason, you may have certain pages that you always want included in your --find-links, to be checked in addition to the normal indexes.
OK Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org

At 03:56 PM 7/11/2006 -0400, Jim Fulton wrote:
On Jul 11, 2006, at 2:07 PM, Phillip J. Eby wrote:
At 11:50 AM 7/11/2006 -0400, Jim Fulton wrote:
I would stop when a result is found.
Even so, this means O(N x M) web hits, where N is the number of packages and M is the number of --find-links (including dependency links supplied by eggs installed so far). I don't think it's reasonable to hit so many non-existent URLs on non-index servers, and is impolite to the servers' operators. (For example, if they receive a daily report of all 404 errors from their web servers, as I do. This is pretty common on Red Hat boxes using logwatch, for example.)
It's particularly unfair since using e.g. http:// peak.telecommunity.com/snapshots/ as a --find-links while installing, say TurboGears, would cause a whole host of "index" hits to subdirectories of that URL, even though none of them can or will be found.
The fallout from this approach is far worse than any "screen scraping" issues we've had.
Isn't this the approach that's followed now?
No; only the --find-links pages themselves are read, and one assumes that they actually exist. :)
Aren't all of the find- links searched as well as the index? I suppose you're referring to the search for /projectname, which potentially doubles the number of requests.
Doubling is only the beginning. If there are 5 dependencies, or 5 requirements on the command line, then it quintuples the number of requests, and they're all going to be retrieving non-existent URLs, except for whichever link was actually the package index. Of course, this is also ignoring the UI reason why the index URL and find-links URLs are specified separately, and that is that the common case is to use PyPI and maybe also a find-link or two. If they were specified by the same option, then any use of find-links would require you to retype the index URL. So, it's not a very convenient UI to merge the concepts, as well as being neither efficient for retrieval speed nor polite to site operators.
participants (2)
-
Jim Fulton
-
Phillip J. Eby