Phillip J. Eby pje at telecommunity.com
Sat Jul 15 01:17:30 CEST 2006

At 03:56 PM 7/11/2006 -0400, Jim Fulton wrote:
>On Jul 11, 2006, at 2:07 PM, Phillip J. Eby wrote:
>>At 11:50 AM 7/11/2006 -0400, Jim Fulton wrote:
>>>I would stop when a result is found.
>>Even so, this means O(N x M) web hits, where N is the number of
>>packages and M is the number of --find-links (including dependency
>>links supplied by eggs installed so far).  I don't think it's
>>reasonable to hit so many non-existent URLs on non-index servers,
>>and is impolite to the servers' operators.  (For example, if they
>>receive a daily report of all 404 errors from their web servers, as
>>I do.  This is pretty common on Red Hat boxes using logwatch, for
>>It's particularly unfair since using e.g. http:// 
>>peak.telecommunity.com/snapshots/ as a --find-links while
>>installing, say TurboGears, would cause a whole host of "index"
>>hits to subdirectories of that URL, even though none of them can or
>>will be found.
>>The fallout from this approach is far worse than any "screen
>>scraping" issues we've had.
>Isn't this the approach that's followed now?

No; only the --find-links pages themselves are read, and one assumes that 
they actually exist.  :)

>   Aren't all of the find- links searched as well as the index?  I suppose 
> you're referring to
>the search for /projectname, which potentially doubles the number of

Doubling is only the beginning.  If there are 5 dependencies, or 5 
requirements on the command line, then it quintuples the number of 
requests, and they're all going to be retrieving non-existent URLs, except 
for whichever link was actually the package index.

Of course, this is also ignoring the UI reason why the index URL and 
find-links URLs are specified separately, and that is that the common case 
is to use PyPI and maybe also a find-link or two.  If they were specified 
by the same option, then any use of find-links would require you to retype 
the index URL.  So, it's not a very convenient UI to merge the concepts, as 
well as being neither efficient for retrieval speed nor polite to site 

