[Catalog-sig] Deprecate External Links
holger krekel
holger at merlinux.eu
Wed Feb 27 18:22:14 CET 2013
On Wed, Feb 27, 2013 at 10:26 -0500, Donald Stufft wrote:
> PyPI is now being served with a valid SSL certificate, and the
> tooling has begun to incorporate SSL verification of PyPI into
> the process. This is _excellent_ and the parties involved should
> all be thanked. However there is still another massive area of
> insecurity within the packaging tool chain.
>
> For those who don't know, when you attempt to install a particular
> package a number of urls are visited. The steps look roughly
> something like this:
>
> 1. Visit http://pypi.python.org/simple/Package/ and attempt to
> collect any links that look like it's installable (tarballs,
> #egg=, etc).
> Note: /simple/Package/ contains download_url, home_page,
> and any link that is contained in the long_description).
> 2. Visit any link referenced as home_page and attempt to
> collect any links that look like it's installable.
> 3. Visit any link referenced in a dependency_links and attempt
> to collect any links that look like it's installable.
> 4. Take all of the collected links and determine which one
> best matches the requirement spec given and download it.
> 5. Rinse and repeat for every dependency in the requirement
> set.
>
> I propose we deprecate the external links that PyPI has published
> on the /simple/ indexes which exist because of the history of PyPI.
> Ideally in some number of months (1? 2?) we would turn off adding
> these links from new releases, leaving the existing ones intact and
> then a few months later the existing links be removed completely.
>
> Reasoning:
> 1. It is difficult to secure the process of spidering external links
> for download.
> 1a. The only way I can think offhand is by requiring uploading
> a hash of the expected files to PyPI along with the download
> link and removing all urls except for the download_url. This
> has the effect that only 1 file can be associated with a particular
> release.
The main means of securing against tampering is author-signatures
and verification by installers. If we have that, the download location
does not matter (pypi/CDN/google/...).
> 2. External links decrease the expected uptime for a particular set
> of requirements. PyPI itself has become very stable, however
> the same cannot be said for all of the hosts linked that the toolchain
> processes. Each new host is an additional SPOF.
>
> Ex: I depend on PyPI and 10 other external packages, each
> service has a 99% uptime so my expected uptime to
> be able to install all my requirements would be ~89% (0.99 ** 11).
There are many links which go to google, bitbucket or github -
i doubt those services have worse availability than pypi.python.org,
rather better.
Also we would be loosing a lot of packages because i expect there to
be a non-trivial amount of packages which will not be transferred to
pypi.python.org no matter how much people here think it's cool.
Why not first have an a good infrastructure and capacity with
pypi.python.org so that people *want* to move their files there?
best,
holger
> 3. Breaks the ability for a CDN and/or mirroring infrastructure to provide
> increased uptime and better latency/throughput across the globe.
> 4. Privacy implications, as a user it is not particularly obvious when
> I run `pip install Foo` what hosts I will be able issuing requests against.
> It is obvious that I will be contacting PyPI and I will have made the
> decision to trust PyPI however it is not obvious what other hosts will
> be able to gather information about me, including what packages I am
> installing. This becomes even more difficult to determine the deeper
> my dependency tree goes.
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
More information about the Catalog-SIG
mailing list