[Distutils] Mystery solved

Phillip J. Eby pje at telecommunity.com
Tue Jul 11 20:07:36 CEST 2006


At 11:50 AM 7/11/2006 -0400, Jim Fulton wrote:
>OK that's an interesting point wrt possible misspellings. If you can
>find the package via the find links, but not via the index, that
>seems to me to be a pretty good indication that this is not a
>misspelling.  This is the case I'm worried about.  If the package
>can't be found anywhere, then I agree that a warning is warranted.

The interesting question there is, should the fallback scan still take 
place in the absence of the warning?  If it *does* take place, then the 
reason for the scan (and delay) is unexplained.  If it does *not* take 
place, then there is an undesirable change in semantics.

Currently, if you have a package called "Bob's Incredible Package", this 
will be treated by easy_install as being spelled 
"Bob-s-Incredible-Package", and it will require a top-level index scan to 
find the right URL.  It is also possible to have --find-links pages 
containing obsolete versions, while PyPI contains the latest version, so 
removing the scan doesn't seem to be a reasonable option.

So, I will simply change the message to an "info" message stating that the 
index page couldn't be found (rather than a warning suggesting 
misspelling), *if* easy_install has previously seen at least one valid 
distribution file or link for the applicable project name.


>The specific case, which I'll repeat from above, as clearly as I can,
>is this:
>
>- A user chooses not to store their software in an index.
>- The user places distributions on a web server somewhere.  This is
>just a directory, it is not a valid index.
>- The user points at their server using find-links
>- The user has an installation and they want to check for newer
>versions.
>- The distributions that they are looking for newer versions of can
>be found on the server that they name via find-links.
>
>In this case, they will get a warning that the distribution they are
>looking for couldn't be found on the index.

Okay, this scenario is fixed by changing to an info message as described above.


>>>Personally, I'd like to find a way to merge these two concepts
>>>into one
>>>by choosing a definition of an index that admits a directory full of
>>>distributions.
>>
>>Feel free to try to come up with one.  However, --find-links allows
>>*multiple* links to be specified, and it is also the basis for the
>>"dependency_links" argument to setup().  --find-links is also a
>>primitive upon which the index facility is built, since index pages
>>are treated more-or-less like --find-links URLs that are
>>automatically generated.
>
>I don't need to, you already did....

No, I presented a straw man to show why it doesn't work.  I guess I 
should've been more explicit in spelling out all the undesirable consequences.


>>   If you did that, however, it brings in the question of which of
>>the --find-links URLs should be checked for a /projectname/
>>subdirectory.  All of them?  Just the first one that finds a
>>result?  None of them, if some other criterion is met?
>
>I would stop when a result is found.

Even so, this means O(N x M) web hits, where N is the number of packages 
and M is the number of --find-links (including dependency links supplied by 
eggs installed so far).  I don't think it's reasonable to hit so many 
non-existent URLs on non-index servers, and is impolite to the servers' 
operators.  (For example, if they receive a daily report of all 404 errors 
from their web servers, as I do.  This is pretty common on Red Hat boxes 
using logwatch, for example.)

It's particularly unfair since using e.g. 
http://peak.telecommunity.com/snapshots/ as a --find-links while 
installing, say TurboGears, would cause a whole host of "index" hits to 
subdirectories of that URL, even though none of them can or will be found.

The fallout from this approach is far worse than any "screen scraping" 
issues we've had.


>What is the use case for spreading distributions over multiple
>servers?  Do people really want to do that? I can see providing
>multiple places to look, because different distributions might be on
>different servers, but I don't see why distributions for a single
>project should be spread over multiple servers.

Platform-specific distributions may be provided by contributors to a 
project, rather than by the project's author; see, for example, Bob 
Ippolito's pages for distributing Mac OS X builds of popular Python 
packages.  For this reason, you may have certain pages that you always want 
included in your --find-links, to be checked in addition to the normal indexes.




More information about the Distutils-SIG mailing list