[Distutils] Surviving a Compromise of PyPI - PEP 458 and 480

Wed Dec 31 20:33:23 CET 2014

> On Dec 31, 2014, at 2:05 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> 
> On 31 December 2014 at 18:42, Donald Stufft <donald at stufft.io> wrote:
>> Just to speak to these two points. The purpose behind having a developer
>> sign some files is that you can verify that those files were signed by
>> the person holding the private key belonging to that developer.
> [...]
> 
> Thanks for the explanation.
> 
>> Ideally you would not use the
>> same password as you use for logging into PyPI because you send that password to
>> PyPI anytime you login which would mean that PyPI would more or less know your
>> private key.
> 
> My problem with this logic is that there's another attack vector that
> this ignores - what if someone gets access to my PC, which has all of
> these passwords in a "saved password" store that I use because it's a
> pain to manage so many passwords (I don't, but you get the point ;-))?
> I work in a number of secure environments where multiple complex
> passwords are mandated - and typically password management becomes
> sufficiently hard that people start to use shortcuts, defeating the
> object of the whole exercise (heck, end users probably just use
> "Password01" everywhere, "because it's too hard to remember all those
> passwords"...)
> 
> That's not to say that bad security practices justify anything, but on
> the other hand human factors do imply that it's not automatically
> guaranteed that two passwords are more secure than one. Single sign-on
> is a goal for a lot of people for a reason...

Basically if your private key, whether it’s a file or it’s a password gets
compromised than the person who compromises your private key can masquerade
as you without anyone being able to detect the difference. At that point
the private key will need to be revoked (as in, programmatically announced as
compromised and no longer valid).

There’s basically no way to have a signing solution that doesn’t involve some
secret bit of data that is known only to you and is never sent to PyPI. As with
most things on PyPI we cannot actually mandate that a user practices good
security with their private key. They could, for instance, post it on Twitter
and there isn’t much that we can do about it. The only way to avoid that is to
more or less give up on the idea of surviving a compromise of the PyPI
infrastructure. That isn’t itself a *wrong* answer, since signing does impose
additional cost to the author and it’s not entirely clear that the cost is
worth it.

One of the benefits of having the author sign is that you get a chain of what
is called an offline key. Essentially all trust comes from pip trusting a set
of “root” keys which are not stored on a server in the PyPI infrastructure and
are instead stored “offline” in some fashion such as on a USB drive sitting in
a safe deposit box, or in an HSM (for the uninitiated, a HSM is a hardware
device that stores keys but does not allow you to get the key itself back out.
You can use the key by asking the HSM to do operations with it but you can’t
get the key back out. Some HSMs even include things like an acid pack inside of
them where attempting to physically open the HSM will break the acid pack and
destroy the internal memory that holds the key). Then from that root key(s) you
can go from key to key in a chain until you get to the author key, and every
key in that chain is not stored “online”.

This chain of offline keys means that if someone takes over the machines
running PyPI they can’t trick people into installing something because the keys
that are required to do that don’t exist on PyPI.

Like most things in security however, there is no singular right answer and
everything is a tradeoff. In this particular case the tradeoff we need to make
is between the ability to survive a compromise of the PyPI machines themselves
and the UX for end users. For example, if we make signing mandatory and the UX
is bad then we're going to hamper the ability for authors to publish new
downloads. If that UX is bad enough we can get into a situation where authors
choose not to release things as often as they would otherwise prefer to because
of the pain associated with releasing. There are also concerns that by making
signing mandatory that it's going to have people who have no interest or desire
in properly securing a private key being forced to make one and they might just
use the same password as they use for PyPI or something like "hunter1" or
whatever. In that case for those people and the users of their projects we've
not added much additional security but we've increased complexity and the
chances of something breaking. If we make signing optional however, then the
benefit might not be worth exposing signing to end users. If, for example,
an end users downloads and installs 100 different files from PyPI and 99 of
them the author opted in to signing them and one of them the author didn't then
a compromise of PyPI can compromise that end user through just that one
project.

I don't believe there is any downside to moving away from relying solely on TLS
and using a TUF based scheme where PyPI itself holds the signing keys. That's
a net win since you're changing having the PyPI infrastructure manage TLS keys
for having the PyPI infrastructure manage TUF keys with the added benefit that
TUF "transfers" in ways that TLS simply can't (such as through the CDN or
through mirrors or what have you).

There *may* be enough downside to having authors sign that it doesn't make
sense to expose that. Part of the discussion around PEP 480 should be hammering
out what the UX looks like for the authors, deciding if that UX is good enough
to make it mandatory or to make it strongly encouraged through pip warnings
and PyPI warnings, and ultimately deciding if the tradeoff of the additional
burden on authors is worth it. If it's not worth it to the community as a whole
then we shouldn't accept PEP 480 and should instead focus on ensuring that
we reduce the ability to compromise PyPI itself (which is something we should
do anyways of course). In this regard having the opinion from someone who isn't
an expert is *extremely* helpful because someone like me already knows how to
manage their keys and already does it so for people like me the answer for if
having authors needing to manage some secret bit of data is asking too much is
an easy answer. No it's not too much to ask me to do that because I'm already
doing it (and a lot of developers are via their SSH private keys for instance).

(How's that for a tl;dr?)

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA