[Catalog-sig] Suggested change to /simple index
P.J. Eby
pje at telecommunity.com
Thu Jul 29 19:51:27 CEST 2010
Recently, a proposal was made to change the sorting of links on
PyPI's /simple index to prevent problems with easy_install finding
out-of-date non-PyPI download links. That proposal, unfortunately,
would not have solved the actual problem.
After giving it some thought, I have an alternative proposal, that I
think *would* solve the problem, and work for all scraping tools
using the /simple index, not just easy_install.
Essentially, the problem is that when links to "hidden" versions were
added to the /simple index (to satisfy users wanting to be able to
download older versions' distributions), in-description and
home/download page links were included. However, if a package's home
page URL or revision control download links change over time, the
older ones still show up in the /simple listing, leading to ambiguity
for download tools.
However, since the actual use case for which this was added was only
to support reaching specific older versions of a project, it isn't
actually necessary to include links that aren't to downloadable files
with a specific version number.
Say package Foo releases version 1.1, causing 1.0 to become
hidden. People still want to be able to download the 1.0's .tgz's or
.rpm's or what-have-you's. However, they do *not* still need to be
able to access the project's older, now-defunct home page, or any of
the extra links included in the older version's description.
It is these extraneous links that cause the problem, not the access
to PyPI-hosted archives.
Now, it could be argued that if a project used its "download" or
"home page" link (or even in-description links) to point to actual
archives, and if that is the case, then older links would be lost by
omitting such links for "hidden" versions. However, if that's really
a problem, it could be remedied by simply checking whether the URL
contains a file extension, or a revision number, or something like that.
However, since the original request to access hidden versions was
aimed squarely at PyPI-hosted downloads, the original use case could
still be met simply by only including PyPI-hosted links for "hidden"
releases, thereby insuring that other links are only shown for
"current" versions -- i.e., ones that package authors would expect
are the only versions whose home/download/description links would
need to be kept up-to-date on.
Making such a change would immediately fix many problematic/ambiguous
links in the /simple index, where out-of-date or no-longer available
links are shown. (It would also fix the security issue whereby
someone acquiring a no-longer-in-service URL could link it to trojan downloads.)
More information about the Catalog-SIG
mailing list