[Catalog-sig] hash tags

Donald Stufft donald at stufft.io
Fri Mar 8 14:09:04 CET 2013


Accidentally sent this to only MAL so resending!

On Mar 8, 2013, at 7:50 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 08.03.2013 13:15, Christian Heimes wrote:
>> Am 08.03.2013 12:49, schrieb M.-A. Lemburg:
>>> Together with the added hash tag on the download file URLs (*),
>>> this would solve the availability and the security aspects.
>>> Instead of deprecating external links altogether, we could then
>>> deprecate non-compliant download links and get an overall
>>> very flexible system for Python package distribution.
>>> 
>>> (*) Yes, I know, I still have to deliver the updated proposal -
>>> been working on getting our indexes ready to serve as example :-)
>> 
>> How does your proposal look like? 
> 
> Here's the first version with the basic idea:
> 
> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal
> 
> After the feedback I got from Holger and Phillip, I'm currently
> writing a new version, which drops some of the unneeded
> requirements and spells out a few more things.
> 
> Here's a very short version...
> 
> Installers are modified:
> 
> * to only follow rel="download" links from the /simple/ index page,
>  which have a hash tag (e.g. #md5=…)

Sounds like a pretty serious break in backwards compat. Only 29 releases out of 144493 currently have a #md5= in their download_url. Either PyPI will be expected to download url and compute a hash (DoS vector, will need to be coded properly) which is error prone and is likely to break in non obvious ways for maintainers.

While I'm obviously not against breaking backwards compatibility, I think if we're going to do that we might as well go whole hog and kill external links completely.

> * will only use the fetched download page if its contents match
>  the hash tag
> * scan that page for rel="download" links, which again have to
>  have a hash tag to be taken into account
> * only install files for which the hash tag matches the
>  downloaded content
> 
> This should provide a good way to make sure that the downloaded
> files are indeed under control of the package maintainer.
> 
> So far the only practical problem I've found with the approach
> is that the download page may not contain dynamic data, e.g.
> a date or timestamp, since that causes the hash tag not to
> verify.
> 
> The package maintainer will also have to reregister the
> package whenever changes to the download page are made -
> but that's actually intended :-)
> 
>> I like to propose query string-like
>> key/value pairs. key/value pairs are more flexible and allow us to
>> add/remove new information in the future.
> 
> Good idea. I'll add that as extension mechanism.
> 
>> I also propose that we add the file size in octets (bytes with 8bits in
>> each byte) to the fragment identifier. File size validation prohibits
>> e.g. length extension attacks. It is useful to download tools. I know
>> that HTTP servers usually set a Content-Length header for static files.
>> But the header is set by the CDN while the information in the fragment
>> identifier shall come from PyPI's internal database.
>> 
>> Example:
>> 
>> defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324
> 
> Minor nit: s/octets/size
> 
> We could probably even add GPG sigs to the link.
> 
> The only problem with the extension mechanism is that the currently
> available installers only support "#md5=…".

pip works just fine with any of the algorithms from hashlib. The installers all
also support #egg=, and there might be some others I can't recall offhand.

> 
> Perhaps there's some way to trick them into still working with
> the query-style fragment links ?!
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/e5d84055/attachment-0001.pgp>


More information about the Catalog-SIG mailing list