[Catalog-sig] Deprecation of External Urls, Statistics

Donald Stufft donald at stufft.io
Fri Mar 8 14:18:44 CET 2013

On Mar 8, 2013, at 8:13 AM, Donald Stufft <donald at stufft.io> wrote:

> On Mar 8, 2013, at 8:07 AM, Jesse Noller <jnoller at gmail.com> wrote:
>> As long as external URLs eventually are completely removed I'm okay with caching things
> So I have mixed feelings on caching the urls. I'm not completely against it however it does present a problem of "Well how do we know if the url we are fetching is the accurate url for that package". Downloading and caching them and presenting them the same as if someone uploaded them directly to PyPI loses a point of distinction between "PyPI can verify this is the package that the author intended to release" and "This is something we think that the author releases, maybe, probably?".

The distinction can be fixed with a rel="external" or rel="cached" or whatever. I believe all the tools will still find them as downloadable targets and can be adapted to print a warning if that's desired. We *might* be caching a package that has already been replaced by an attacker but by caching and centralizing it we have a better way of removing it once it's found. The legal issues is something we'd probably need to ask VanL?

So that's an Ok, Neutral, and Unknown for my 3 major complaints.

> It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too?
>> On Mar 8, 2013, at 6:49 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
>>> On 08.03.2013 02:40, Donald Stufft wrote:
>>>> So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about.
>>>> This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI.
>>>> The results and script is available at: https://gist.github.com/dstufft/5088915
>>>> Some statistics:
>>>>  Projects affected (with dev): 2269
>>>>  Versions affected (with dev): 8006
>>>>  Projects affected (without dev): 1880
>>>>  Versions affected (without dev): 7586
>>>> These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project.
>>> Thanks for running the test.
>>> About 10% of all packages. The numbers are already impressive,
>>> but if you factor in the popularity of some of those
>>> packages, the situation becomes worse.
>>> I'm beginning to wonder whether caching the external link content
>>> on the PyPI CDN wouldn't be a better idea.
>>> We'd have to make that legally waterproof and also have an opt-out
>>> mechanism, but it would get us from here to there a lot faster.
>>> Together with the added hash tag on the download file URLs (*),
>>> this would solve the availability and the security aspects.
>>> Instead of deprecating external links altogether, we could then
>>> deprecate non-compliant download links and get an overall
>>> very flexible system for Python package distribution.
>>> (*) Yes, I know, I still have to deliver the updated proposal -
>>> been working on getting our indexes ready to serve as example :-)
>>> -- 
>>> Marc-Andre Lemburg
>>> eGenix.com
>>> Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>>> ________________________________________________________________________
>>> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>>> eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>>  D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>>         Registered at Amtsgericht Duesseldorf: HRB 46611
>>>             http://www.egenix.com/company/contact/
>>> _______________________________________________
>>> Catalog-SIG mailing list
>>> Catalog-SIG at python.org
>>> http://mail.python.org/mailman/listinfo/catalog-sig
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/566646e1/attachment.pgp>

More information about the Catalog-SIG mailing list