[Catalog-sig] hash tags

M.-A. Lemburg mal at egenix.com
Fri Mar 8 13:50:33 CET 2013

On 08.03.2013 13:15, Christian Heimes wrote:
> Am 08.03.2013 12:49, schrieb M.-A. Lemburg:
>> Together with the added hash tag on the download file URLs (*),
>> this would solve the availability and the security aspects.
>> Instead of deprecating external links altogether, we could then
>> deprecate non-compliant download links and get an overall
>> very flexible system for Python package distribution.
>> (*) Yes, I know, I still have to deliver the updated proposal -
>> been working on getting our indexes ready to serve as example :-)
> How does your proposal look like? 

Here's the first version with the basic idea:


After the feedback I got from Holger and Phillip, I'm currently
writing a new version, which drops some of the unneeded
requirements and spells out a few more things.

Here's a very short version...

Installers are modified:

* to only follow rel="download" links from the /simple/ index page,
  which have a hash tag (e.g. #md5=...)
* will only use the fetched download page if its contents match
  the hash tag
* scan that page for rel="download" links, which again have to
  have a hash tag to be taken into account
* only install files for which the hash tag matches the
  downloaded content

This should provide a good way to make sure that the downloaded
files are indeed under control of the package maintainer.

So far the only practical problem I've found with the approach
is that the download page may not contain dynamic data, e.g.
a date or timestamp, since that causes the hash tag not to

The package maintainer will also have to reregister the
package whenever changes to the download page are made -
but that's actually intended :-)

> I like to propose query string-like
> key/value pairs. key/value pairs are more flexible and allow us to
> add/remove new information in the future.

Good idea. I'll add that as extension mechanism.

> I also propose that we add the file size in octets (bytes with 8bits in
> each byte) to the fragment identifier. File size validation prohibits
> e.g. length extension attacks. It is useful to download tools. I know
> that HTTP servers usually set a Content-Length header for static files.
> But the header is set by the CDN while the information in the fragment
> identifier shall come from PyPI's internal database.
> Example:
> defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324

Minor nit: s/octets/size

We could probably even add GPG sigs to the link.

The only problem with the extension mechanism is that the currently
available installers only support "#md5=...".

Perhaps there's some way to trick them into still working with
the query-style fragment links ?!

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Catalog-SIG mailing list