On 2 January 2015 at 18:57, Donald Stufft <donald@stufft.io> wrote:
I have concerns about the actual feasibility of doing such a thing, some of which are similar to my concerns with doing non-mandatory PEP 480.

* If uploading to a verifier service is optional then a significant portion of authors simply won’t do it and if you installing 100 things, and 99 of them are verified and 1 of them are not then there is an attack vector that I can use to compromise you undetected (since the author didn’t upload their verification somewhere else).

Independently of the technical details of the "enhanced validation" support, I now agree pip would at least need to acquire a "fully validated downloads only" mode, where it refused to install anything that wasn't trusted at the higher integrity level.

However, it's worth keeping in mind that the truly security conscious people aren't going to ever trust any package that hasn't been through at least some level of preliminary security review (whether their own, or that of an organisation they trust).
 
* It’s not actually less work in general, it just pushes the work from the PyPI administrators to the community. This can work well if the community is willing to step up! However, PyPI’s availability/speed problems were originally attempted to be solved by pushing the work to the community via the original mirror system and (not to downplay the people who did step up) the response was not particularly great and the mirrors got a few at first and as the shiny factor wore off people’s mirrors shutdown or stopped working or what have you.

From a practical perspective, I think the work being pushed to the developer community is roughly equivalent. The only question is whether the secret developers are being asked to manage is a separate authentication token for PyPI uploads (which is then used to mathematically secure uploads in such a way that PyPI itself can't change the release metadata for a project), or an account on a separate validation service, either run by the developer themselves, or by a trusted third party like the OpenStack Foundation, Fedora, Mozilla, etc.

My contention is that "to support enhanced validation of your project, you, and anyone else authorised to make releases for your project, will need an account on a metadata validation server, and whenever you make a new release, you will need to register it with the metadata validation server before publishing it on PyPI (otherwise folks using enhanced validation won't be able to install it)" is a relatively simple and easy to understand approach, while "to support enhanced validation of your project, you, and anyone else authorised to make releases for your project, will need to get a developer key signed by the PyPI administrators, and no, you can't just upload the key though the PyPI web UI, because if we let you do that, it wouldn't provide the desired security properties" is inherently confusing, and there's no explanation we can provide that will make it make sense to anyone that hasn't spent years thoroughly steeped in this stuff.

Yes, the mathematics technically lets us provide the desired security guarantees within the scope of a single service, but doing it that way is confusing to users in a way that I now believe can be avoided through clear structural separation of the "content distribution" functionality of the main PyPI service and the "content validation" functionality of separate metadata validation services. Having the validation services run by someone *other than the PSF* can also be explained in a way that makes intuitive sense even when someone doesn't understand the technical details, since it's clear that folks with privileged access to a distribution service run by the PSF wouldn't necessarily have privileged access to a validation service run by the OpenStack Foundation (etc) and vice-versa.

The "independent online services" approach also provides additional flexibility that isn't offered by a purely mathematical solution - an online metadata service potentially *can* regenerate its metadata in a new format if they can be persuaded its necessary. Having that capability represents a potential denial of service vulnerability specifically against enhanced validation mode (if a validation server is compromised), but there are already much easier ways to execute those.
 
* A number of the attacks that TUF protects against do not rely on the attacker creating malicious software packages, things only showing known insecure versions of a project so that they can then attack people through a known exploit. It’s not *wrong* to not protect against these (most systems don’t) but we’d want to explicitly decide that we’re not going to.

External metadata validation servers can still protect against downgrade attacks - "release is unexpectedly missing" is a metadata discrepancy, just like "release contents are unexpectedly different".

Validation servers can also provide additional functionality, like mapping from CVEs to affected versions, that can't be sensibly offered through the same service that is responsible for publishing the software in the first place.
 
I’d note that PEP 480 and your proposal aren’t really mutually exclusive so there’s not really harm in *trying* yours and if it fails falling back to something like PEP 480 other than end user confusion if that gets shut down and the cost of actually developing/setting up that solution.

Overall I’m +1 on things that enable better detection of a compromise but I’m probably -0.5 or so on your specific proposal as I think that expecting developers to upload verification data to “verification servers” is just pushing work onto other people just so we don’t have to do it.

Getting them to manage additional keys, and get them signed and registered appropriately, and then supplying them is going to be a similar amount of work, and the purpose is far more cryptic and confusing. My proposal is basically that instead of asking developers to manage signing keys, we should instead be ask them to manage account on a validation server (or servers).

Building on top of service account management also means developers can potentially leverage existing *account* security tools, rather than needing to come up with custom key management solutions for PyPI publishing keys.

Technically such a validation server could be run on the PSF infrastructure, but that opens up lots of opportunities for common attacks that provide privileged access to both PyPI and the validation servers - by moving the operation of the latter out to trusted third party organisations, we'd be keeping indirect attacks through the PSF infrastructure side from compromising both systems at the same time.
 
I also think your two questions are not exactly right, because all that means is that it becomes harder to attack *everyone* via a PyPI compromise, however it’s still trivial to attack specific people if you’ve compromised PyPI or the CDN since you can selectively serve maliciously signed packages depending on who is requesting them. To this end I don’t think a solution that pip doesn’t implement is actually going to prevent anything but very dumb attacks by an attacker who has already compromised the PyPI machines.

Yes, while I was thinking we may be able to get away without pip providing enhanced validation support directly, I now agree we'll need to provide it. However, the UX of that shouldn't depend on the technical details of how enhanced validation mode is actually implemented - from an end user perspective, the key things they need to know are:

* for many cases, the default level of validation offered by pip (given PEP 458) is likely to be good enough, and will provide access to all the packages on PyPI
* for some cases, the default level of validation *isn't* good enough, and for those, pip offers an "enhanced validation" mode
* if you turn on enhanced validation, there will be a lot of software that *won't* install
* if you want to use enhanced validation, but some projects you'd like to use don't support it, then you'll either need to offer those projects assistance with supporting enhanced validation mode, consider using different dependencies, or else reconsider whether or not you really need enhanced validation for your current use case
 
I think another issue here is that we’re effectively doing something similar to TLS except instead of domain names we have project names and that although there are *a lot* of people who really hate the CA system nobody has yet come up with an effective means of actually replacing it without regressing into worse security. The saving grace here is that we operate at a much smaller scale (one “DNS” root, one trust root, ~53k unique names vs… more than I feel like account) so it’s possible that solutions which don’t scale at TLS scale might scale at PyPI scale.

It's worth noting that my validation server idea is still very much a CA-style model - it's just that instead of registering developer keys with PyPI (as in PEP 480), we'd be registering the trust roots of metadata validation servers with pip.

All my idea really does is take the key management problem for enhanced validation away from developers, replacing it with an account management problem that they're likely to already be thoroughly familiar with, even if they don't any experience in secure software distribution.

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia