[Distutils] easy_install wrong download site preference

P.J. Eby pje at telecommunity.com
Fri Jul 2 04:06:04 CEST 2010

At 01:33 AM 7/2/2010 +0300, anatoly techtonik wrote:
>On Fri, Jul 2, 2010 at 12:10 AM, P.J. Eby <pje at telecommunity.com> wrote:
> >> >
> >> > It prefers newer packages, or, if the versions are the same, it prefers
> >> > the shortest download URL. Â In this case, the Google Code 
> url is shorter.
> >>
> >> That's illogical. Better prefer PyPI if versions are the same.
> >
> > The "shortest path" logic is there to avoid certain file recognition
> > problems that occur without it. Â The special case of PyPI isn't special
> > enough to break those rules.
>Although practicality beats purity. Can you list those "certain file
>recognition problems"? I.e. Explicit is better than implicit.

I have a vague recollection that it was Fredrik Lundh's website that 
sparked the original realization of the need for preferring shorter 
URLs, but I wouldn't swear to it.  I'd have to dig through years of 
revision history to find the original change, assuming I documented 
it well enough.  The choice of short paths over long was also 
intended to favor nearby files over further ones, and local paths over URLs.

(All that being said, it's still fundamentally a heuristic, and not a 
very good one at that.  But that doesn't automatically make any other 
heuristic *better*; this is one area where status quo bias is a good thing.)

>That's why it should use the site where all filenames are Python
>downloads if filenames are the same.

And how would that work with all the PyPI clones, private indexes, etc.?

> > No. Â You'd need to remove the current "home_page" setting, or point it
> > elsewhere.
>That's very strange. Then what download_url is for?

The home page and download URLs are simply treated as pages which may 
contain links to files, if they are not themselves links to 
files.  That's the only special status they have.

> >> Â (I understand that people do not want to touch setuptools code
> >> anymore)
> >
> > That's not really the issue; the issue here is that package precedence is
> > based on a stable comparison scheme, where it doesn't make sense 
> to give one
> > location priority over another, as it will simply lead to someone else
> > complaining about the changed behavior, because they were relying on a
> > different URL having precedence under the current scheme.
>These rules need to be described first. What if somebody already broke
>the proper order and now everybody suffers? If autodiscovery rules
>were well described - it was possible to analyse them and propose more
>intuitive approach. Then if "someone else" will attempt to complain -
>you could send them to the PEP or another "how and why" document.



>I thought it will raise the weight of those links if there could be a
>rel="download" attribute.

There is no "weighting" of links - what is weighted are 
distributions, and distribution objects only have their raw URL 
available as a basis for sorting once the version and archive type 
(the two higher-precedence attributes) are considered.  The place 
where a URL was retrieved from is not tracked, and thus can't be used 
for sorting without a good chunk of refactoring...  which refactoring 
would likely break tools that build on top of setuptools' PackageIndex class.

In short, what you're asking for is a pretty major feature that would 
be difficult to implement in a way that would be guaranteed not to 
break other things.

More information about the Distutils-SIG mailing list