[Distutils] please fix easy_install shorter URL preference (Was: easy_install wrong download site preference)

anatoly techtonik techtonik at gmail.com
Sun Sep 12 11:08:59 CEST 2010

Hello Phillip,

Again, I've run into completely irrational 'easy_install' behavior
that prefers to download from shorter URLs instead of PyPI ones if the
same filename exists in both places. `protobuf` project and `gerald`
are not installable due to this behavior.


Gerald is the second package from the last 4 months that I couldn't
install due to this very obscure behavior, and I suspect there are

Phillip, could you, please, still run through your archives to success
in explaining why this logic of choosing a shorter download URL is
necessary? I'd really to see what packages benefit from it, because I
believe Google protocol buffers popularity alone is enough to change
the behavior.

anatoly t.

On Fri, Jul 2, 2010 at 5:06 AM, P.J. Eby <pje at telecommunity.com> wrote:
> At 01:33 AM 7/2/2010 +0300, anatoly techtonik wrote:
>> On Fri, Jul 2, 2010 at 12:10 AM, P.J. Eby <pje at telecommunity.com> wrote:
>> >> >
>> >> > It prefers newer packages, or, if the versions are the same, it
>> >> > prefers
>> >> > the shortest download URL. Â In this case, the Google Code url is
>> >> > shorter.
>> >>
>> >> That's illogical. Better prefer PyPI if versions are the same.
>> >
>> > The "shortest path" logic is there to avoid certain file recognition
>> > problems that occur without it. Â The special case of PyPI isn't special
>> > enough to break those rules.
>> Although practicality beats purity. Can you list those "certain file
>> recognition problems"? I.e. Explicit is better than implicit.
> I have a vague recollection that it was Fredrik Lundh's website that sparked
> the original realization of the need for preferring shorter URLs, but I
> wouldn't swear to it.  I'd have to dig through years of revision history to
> find the original change, assuming I documented it well enough.  The choice
> of short paths over long was also intended to favor nearby files over
> further ones, and local paths over URLs.
> (All that being said, it's still fundamentally a heuristic, and not a very
> good one at that.  But that doesn't automatically make any other heuristic
> *better*; this is one area where status quo bias is a good thing.)
>> That's why it should use the site where all filenames are Python
>> downloads if filenames are the same.
> And how would that work with all the PyPI clones, private indexes, etc.?
>> > No. Â You'd need to remove the current "home_page" setting, or point it
>> > elsewhere.
>> That's very strange. Then what download_url is for?
> The home page and download URLs are simply treated as pages which may
> contain links to files, if they are not themselves links to files.  That's
> the only special status they have.
>> >> Â (I understand that people do not want to touch setuptools code
>> >> anymore)
>> >
>> > That's not really the issue; the issue here is that package precedence
>> > is
>> > based on a stable comparison scheme, where it doesn't make sense to give
>> > one
>> > location priority over another, as it will simply lead to someone else
>> > complaining about the changed behavior, because they were relying on a
>> > different URL having precedence under the current scheme.
>> These rules need to be described first. What if somebody already broke
>> the proper order and now everybody suffers? If autodiscovery rules
>> were well described - it was possible to analyse them and propose more
>> intuitive approach. Then if "someone else" will attempt to complain -
>> you could send them to the PEP or another "how and why" document.
> http://peak.telecommunity.com/DevCenter/setuptools#making-your-package-available-for-easyinstall
> http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
>> I thought it will raise the weight of those links if there could be a
>> rel="download" attribute.
> There is no "weighting" of links - what is weighted are distributions, and
> distribution objects only have their raw URL available as a basis for
> sorting once the version and archive type (the two higher-precedence
> attributes) are considered.  The place where a URL was retrieved from is not
> tracked, and thus can't be used for sorting without a good chunk of
> refactoring...  which refactoring would likely break tools that build on top
> of setuptools' PackageIndex class.
> In short, what you're asking for is a pretty major feature that would be
> difficult to implement in a way that would be guaranteed not to break other
> things.

More information about the Distutils-SIG mailing list