[Distutils] Reviving PEP 470 - Removing External Hosting support on PyPI

Donald Stufft donald at stufft.io
Thu Aug 27 17:16:57 CEST 2015

On August 27, 2015 at 10:33:15 AM, Tres Seaver (tseaver at palladion.com) wrote:
> On 08/27/2015 07:51 AM, Donald Stufft wrote:
> > This leaves the user feeling annoyed that we didn’t just search those
> > locations by default. I truly think it is a bad experience and I only
> > ever added it because I wanted the discussion to be over with and I
> > was trying to placate people by giving them a bad feature
> I don't understand the sensibility here: an error message which tells me
> "not hosted on PyPI, try 'pip...' instead" seems like a *good* UX to me:
> Having a tool which respectw its default policy ("trust only PyPI")
> while giving me the information I need to off-road when needed is a good
> balance.

Given my experience dealing with pip’s users and the fallout of PEP 438, the very next question we’d get after implementing that UI will either be “If pip knows what I need to do, why can’t it just do it for me instead of making me type it out again” OR “Give me a flag so I can just automatically accept every externally hosted index”.

Both of these asks are completely logical from an end user who doesn’t understand why the situation is the way it is, but are also essentially “let me be insecure all the time implicitly” flags.

On the other hand, if we just remove it then we can explain that we used to support an insecure method of finding links, but that we no longer support it. The difference here is that there is no bread crumb of “here’s some information that pip obviously knows, because it’s telling it to you” to lead people to ask for something to opt into a “global insecure” flag. We have a clear answer that doesn’t leave room for argument: “We no longer get that information from PyPI so we cannot use it”.

I think it’s a bad API because I think it’s going to cause frustration with people, particularly that pip is making them do extra leg work to approve or type in an repository URL. In addition, the discovery mechanism will only be in new versions of pip, however only about a third of our users upgrade quickly after a release (see: https://caremad.io/2015/04/a-year-of-pypi-downloads/) so the error case is going to be happening with the vast bulk of users anyways (The “Unknown” in that graph is older than 1.4).

I also think it’s a bad experience because you’re mandating that they are lowering the “uptime” of any particular installation set that includes an external repository unless every repository has a 100% uptime. This is because you’re adding new single points of failures into a system. PyPI has had a 99.94% uptime over the last year which corresponds with 5 1/2 hours of downtime. Let’s assume that’s a rough average for what we can expect, If someone adds a single additional repository then the uptime of the system as a whole becomes 99.88% (or X hours of downtime), a third repository brings it to 99.82% (X hours), a fourth brings it to 99.76% (X hours). I think this is a conservative estimate of what the affects of the downtime would be.

On the other hand, here’s what I consider a good experience which is possible if my assumptions about what is acceptable for data sovereignty are correct:

Project “Foo” doesn’t want to host their projects in the US for $REASONS, they go to https://pypi.python.org/ and register their project, when registering they select to have their uploads hosted in the EU. Anytime they upload their files to https://pypi.python.org instead of storing them in a bucket in us-west-2, PyPI checks for their preferences, sees they have selected the EU and instead stores their files in eu-west-1 (Ireland).

User “Jill” wants to install Project “Foo” and she is using pip 1.5.6 from her Debian operating system. When she types in ``pip install Foo`` pip goes to https://pypi.python.org/simple/foo/ gets a list of files which have been hosted in the EU. Without any updates or changes required on her end, pip downloads these files and installs them.

Here’s the thing though, which I’ve been saying: I don’t know the laws and I don’t think it’s reasonable to expect me to learn the laws for all these other countries. There are open questions on how to actually implement this. For example, what exactly are we trying to achieve? If we’re trying to protect against the US government compelling the hosting company to do something, then you’re pretty much boned because if the files were hosted in the EU you still have the fact that it’d be a service controlled by a US Non Profit, ran by volunteers that live in the US, developed by someone who lives in the US who is employed by someone who lives in the US. If we’re trying to comply with some sort of data locality laws like https://en.wikipedia.org/wiki/Data_Protection_Directive does OSS even count as “personal data”? If it does, then does uploading it to https://pypi.python.org/ which is located in the US but storing and hosting it from the EU satisfy the requirements? What about putting it behind Fastly (another US company), when a US user requests those files can it route them and cache them in a US Datacenter? Is it OK to have it linked from https://pypi.python.org/ (Again, hosted in the US) or do we need a whole separate repository to handle these files?

I think we can make this a great experience, but it is it’s own discussion and it needs to include stakeholders who actually know what the requirements are. I need someone who can put forth some effort into making it a reality instead of expecting me to do it all. If nobody wants to put in any effort to make it happen, maybe it’s not actually that important to them?

Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

More information about the Distutils-SIG mailing list