[Distutils] Surviving a Compromise of PyPI - PEP 458 and 480

Fri Jan 2 16:51:21 CET 2015

On 2 January 2015 at 18:57, Donald Stufft <donald at stufft.io> wrote:

> I have concerns about the actual feasibility of doing such a thing, some
> of which are similar to my concerns with doing non-mandatory PEP 480.
>
> * If uploading to a verifier service is optional then a significant
> portion of authors simply won’t do it and if you installing 100 things, and
> 99 of them are verified and 1 of them are not then there is an attack
> vector that I can use to compromise you undetected (since the author didn’t
> upload their verification somewhere else).
>

Independently of the technical details of the "enhanced validation"
support, I now agree pip would at least need to acquire a "fully validated
downloads only" mode, where it refused to install anything that wasn't
trusted at the higher integrity level.

However, it's worth keeping in mind that the truly security conscious
people aren't going to ever trust any package that hasn't been through at
least some level of preliminary security review (whether their own, or that
of an organisation they trust).

> * It’s not actually less work in general, it just pushes the work from the
> PyPI administrators to the community. This can work well if the community
> is willing to step up! However, PyPI’s availability/speed problems were
> originally attempted to be solved by pushing the work to the community via
> the original mirror system and (not to downplay the people who did step up)
> the response was not particularly great and the mirrors got a few at first
> and as the shiny factor wore off people’s mirrors shutdown or stopped
> working or what have you.
>

>From a practical perspective, I think the work being pushed to the
developer community is roughly equivalent. The only question is whether the
secret developers are being asked to manage is a separate authentication
token for PyPI uploads (which is then used to mathematically secure uploads
in such a way that PyPI itself can't change the release metadata for a
project), or an account on a separate validation service, either run by the
developer themselves, or by a trusted third party like the OpenStack
Foundation, Fedora, Mozilla, etc.

My contention is that "to support enhanced validation of your project, you,
and anyone else authorised to make releases for your project, will need an
account on a metadata validation server, and whenever you make a new
release, you will need to register it with the metadata validation server
before publishing it on PyPI (otherwise folks using enhanced validation
won't be able to install it)" is a relatively simple and easy to understand
approach, while "to support enhanced validation of your project, you, and
anyone else authorised to make releases for your project, will need to get
a developer key signed by the PyPI administrators, and no, you can't just
upload the key though the PyPI web UI, because if we let you do that, it
wouldn't provide the desired security properties" is inherently confusing,
and there's no explanation we can provide that will make it make sense to
anyone that hasn't spent years thoroughly steeped in this stuff.

Yes, the mathematics technically lets us provide the desired security
guarantees within the scope of a single service, but doing it that way is
confusing to users in a way that I now believe can be avoided through clear
structural separation of the "content distribution" functionality of the
main PyPI service and the "content validation" functionality of separate
metadata validation services. Having the validation services run by someone
*other than the PSF* can also be explained in a way that makes intuitive
sense even when someone doesn't understand the technical details, since
it's clear that folks with privileged access to a distribution service run
by the PSF wouldn't necessarily have privileged access to a validation
service run by the OpenStack Foundation (etc) and vice-versa.

The "independent online services" approach also provides additional
flexibility that isn't offered by a purely mathematical solution - an
online metadata service potentially *can* regenerate its metadata in a new
format if they can be persuaded its necessary. Having that capability
represents a potential denial of service vulnerability specifically against
enhanced validation mode (if a validation server is compromised), but there
are already much easier ways to execute those.

> * A number of the attacks that TUF protects against do not rely on the
> attacker creating malicious software packages, things only showing known
> insecure versions of a project so that they can then attack people through
> a known exploit. It’s not *wrong* to not protect against these (most
> systems don’t) but we’d want to explicitly decide that we’re not going to.
>

External metadata validation servers can still protect against downgrade
attacks - "release is unexpectedly missing" is a metadata discrepancy, just
like "release contents are unexpectedly different".

Validation servers can also provide additional functionality, like mapping
from CVEs to affected versions, that can't be sensibly offered through the
same service that is responsible for publishing the software in the first
place.

> I’d note that PEP 480 and your proposal aren’t really mutually exclusive
> so there’s not really harm in *trying* yours and if it fails falling back
> to something like PEP 480 other than end user confusion if that gets shut
> down and the cost of actually developing/setting up that solution.
>
> Overall I’m +1 on things that enable better detection of a compromise but
> I’m probably -0.5 or so on your specific proposal as I think that expecting
> developers to upload verification data to “verification servers” is just
> pushing work onto other people just so we don’t have to do it.
>

Getting them to manage additional keys, and get them signed and registered
appropriately, and then supplying them is going to be a similar amount of
work, and the purpose is far more cryptic and confusing. My proposal is
basically that instead of asking developers to manage signing keys, we
should instead be ask them to manage account on a validation server (or
servers).

Building on top of service account management also means developers can
potentially leverage existing *account* security tools, rather than needing
to come up with custom key management solutions for PyPI publishing keys.

Technically such a validation server could be run on the PSF
infrastructure, but that opens up lots of opportunities for common attacks
that provide privileged access to both PyPI and the validation servers - by
moving the operation of the latter out to trusted third party
organisations, we'd be keeping indirect attacks through the PSF
infrastructure side from compromising both systems at the same time.

> I also think your two questions are not exactly right, because all that
> means is that it becomes harder to attack *everyone* via a PyPI compromise,
> however it’s still trivial to attack specific people if you’ve compromised
> PyPI or the CDN since you can selectively serve maliciously signed packages
> depending on who is requesting them. To this end I don’t think a solution
> that pip doesn’t implement is actually going to prevent anything but very
> dumb attacks by an attacker who has already compromised the PyPI machines.
>

Yes, while I was thinking we may be able to get away without pip providing
enhanced validation support directly, I now agree we'll need to provide it.
However, the UX of that shouldn't depend on the technical details of how
enhanced validation mode is actually implemented - from an end user
perspective, the key things they need to know are:

* for many cases, the default level of validation offered by pip (given PEP
458) is likely to be good enough, and will provide access to all the
packages on PyPI
* for some cases, the default level of validation *isn't* good enough, and
for those, pip offers an "enhanced validation" mode
* if you turn on enhanced validation, there will be a lot of software that
*won't* install
* if you want to use enhanced validation, but some projects you'd like to
use don't support it, then you'll either need to offer those projects
assistance with supporting enhanced validation mode, consider using
different dependencies, or else reconsider whether or not you really need
enhanced validation for your current use case

> I think another issue here is that we’re effectively doing something
> similar to TLS except instead of domain names we have project names and
> that although there are *a lot* of people who really hate the CA system
> nobody has yet come up with an effective means of actually replacing it
> without regressing into worse security. The saving grace here is that we
> operate at a much smaller scale (one “DNS” root, one trust root, ~53k
> unique names vs… more than I feel like account) so it’s possible that
> solutions which don’t scale at TLS scale might scale at PyPI scale.
>

It's worth noting that my validation server idea is still very much a
CA-style model - it's just that instead of registering developer keys with
PyPI (as in PEP 480), we'd be registering the trust roots of metadata
validation servers with pip.

All my idea really does is take the key management problem for enhanced
validation away from developers, replacing it with an account management
problem that they're likely to already be thoroughly familiar with, even if
they don't any experience in secure software distribution.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150103/dc2d6394/attachment-0001.html>