[Catalog-sig] Deprecate External Links

Jesse Noller jnoller at gmail.com
Wed Feb 27 16:36:41 CET 2013



On Wednesday, February 27, 2013 at 10:26 AM, Donald Stufft wrote:

> PyPI is now being served with a valid SSL certificate, and the
> tooling has begun to incorporate SSL verification of PyPI into
> the process. This is _excellent_ and the parties involved should
> all be thanked. However there is still another massive area of
> insecurity within the packaging tool chain.
> 
> For those who don't know, when you attempt to install a particular
> package a number of urls are visited. The steps look roughly
> something like this:
> 
> 1. Visit http://pypi.python.org/simple/Package/ and attempt to
> collect any links that look like it's installable (tarballs,
> #egg=, etc).
> Note: /simple/Package/ contains download_url, home_page,
> and any link that is contained in the long_description).
> 2. Visit any link referenced as home_page and attempt to
> collect any links that look like it's installable.
> 3. Visit any link referenced in a dependency_links and attempt
> to collect any links that look like it's installable.
> 4. Take all of the collected links and determine which one
> best matches the requirement spec given and download it.
> 5. Rinse and repeat for every dependency in the requirement
> set. 
> 
> I propose we deprecate the external links that PyPI has published
> on the /simple/ indexes which exist because of the history of PyPI.
> Ideally in some number of months (1? 2?) we would turn off adding
> these links from new releases, leaving the existing ones intact and
> then a few months later the existing links be removed completely.
> 
> Reasoning:
> 1. It is difficult to secure the process of spidering external links
> for download.
> 1a. The only way I can think offhand is by requiring uploading
> a hash of the expected files to PyPI along with the download
> link and removing all urls except for the download_url. This
> has the effect that only 1 file can be associated with a particular
> release.
> 2. External links decrease the expected uptime for a particular set
> of requirements. PyPI itself has become very stable, however
> the same cannot be said for all of the hosts linked that the toolchain
> processes. Each new host is an additional SPOF.
> 
> Ex: I depend on PyPI and 10 other external packages, each
> service has a 99% uptime so my expected uptime to
> be able to install all my requirements would be ~89% (0.99 ** 11).
> 3. Breaks the ability for a CDN and/or mirroring infrastructure to provide
> increased uptime and better latency/throughput across the globe.
> 4. Privacy implications, as a user it is not particularly obvious when
> I run `pip install Foo` what hosts I will be able issuing requests against.
> It is obvious that I will be contacting PyPI and I will have made the
> decision to trust PyPI however it is not obvious what other hosts will
> be able to gather information about me, including what packages I am
> installing. This becomes even more difficult to determine the deeper
> my dependency tree goes.


I fully support this. 

As an aside, if CDN/storage concerns are an issue, I have an outstanding offer from a large hosting company to take care of the CDN aspects for us.  

Jesse




More information about the Catalog-SIG mailing list