[Catalog-sig] hash tags

PJ Eby pje at telecommunity.com
Fri Mar 8 20:16:57 CET 2013


On Fri, Mar 8, 2013 at 7:50 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> After the feedback I got from Holger and Phillip, I'm currently
> writing a new version, which drops some of the unneeded
> requirements and spells out a few more things.
>
> Here's a very short version...
>
> Installers are modified:
>
> * to only follow rel="download" links from the /simple/ index page,
>   which have a hash tag (e.g. #md5=...)
> * will only use the fetched download page if its contents match
>   the hash tag
> * scan that page for rel="download" links, which again have to
>   have a hash tag to be taken into account
> * only install files for which the hash tag matches the
>   downloaded content
>
> This should provide a good way to make sure that the downloaded
> files are indeed under control of the package maintainer.

There is, as I said before, a MUCH simpler way to do this, that works
right now: put direct #md5 download links in your description, and
phase out the rel="" attributes altogether.

The key to making this transition isn't creating elaborate new
standards for the tools, it's *creating new tools for the standards*.

Specifically, *migration tools*.  A migration tool could be made that
scans existing external links and converts found links to #md5 links
or alternately uploads the files themselves to PyPI.  You can do that
without changing pip or distribute or anything else but PyPI, so
there's no need to wait out update cycles to take advantage.

Once a project/version has switched to either #md5 links or PyPI
copies, you can just drop the rel="" attributes and you're done.

Alternately, if using the description for download links is considered
a bad idea, add a new field to PyPI for them.

Point is, this entire thing can be done correctly at the PyPI end and
work with the existing API of the download tools.


> So far the only practical problem I've found with the approach
> is that the download page may not contain dynamic data, e.g.
> a date or timestamp, since that causes the hash tag not to
> verify.

Which is completely unnecessary if one simply exposes the *actual*
download links directly on PyPI.  The download page is redundant, in a
couple different ways.  First, since it can't change, there's no point
in re-fetching it all the time.  Second, since it's only going to be
read by tools anyway, there's no point to it containing anything
besides the link.

So, since the page only contains links, might as well put the links
straight on PyPI, or at most have an option/tool to load the links
from an external source.

Again, the key to making this work is going to be somebody putting
buttons in the PyPI interface (and making setuptools/distutils
commands or similar CLI tools) to migrate their files (or links to the
files) to PyPI hosting.  A new API for such tools is entirely
unnecessary -- at most there might need to be a new field made
available/accessible.  (Personally I don't care if your download links
have to be in the description field if you're hosting off-site, but
that's just me.)


More information about the Catalog-SIG mailing list