[Distutils] Surviving a Compromise of PyPI - PEP 458 and 480

Fri Jan 2 13:45:42 CET 2015

On 2 January 2015 at 11:21, Donald Stufft <donald at stufft.io> wrote:
> To be clear, there is zero delay in being able to publish a new project, the
> delay is between moving from a new project being validated by an online key
> to an offline key.

OK, got it.

Although on the terminology front, I don't really understand what an
"online key" and an "offline key" are. My mind translates them as
"worse" and "better" from context, but that's about all. I'm not
asking for an explanation (I can look that up) but as terms being
encountered by the non-specialist, they contribute to the difficulty
in reading the proposals. If there's any better way of naming these
two types of keys, that would be accessible to the non-specialist,
that would be a great help.

(Actually, the other implication I read into "offline key" is
"something that I, as the owner, have to take care of and manage" -
and that scares me because I'm rubbish at managing keys or anything
like that, basically anything more complex than a password - at least
until we get to things like RSA keys and tokens that I use for "work"
and am paid to manage as opposed to "hobby" stuff).

> The only real difference between validation levels is that
> until it's been signed by an offline key then people installing that project
> are vulnerable to someone compromising PyPI. This is because until the
> delegation of project X to a specific developer has been signed the "chain of
> trust" contains a key that is sitting on the harddrive of PyPI.
>
> However, once a delegation of Project X _has_ been signed changing that
> delegation would need waiting until the next time the delegations were signed
> by the offline keys. This is because once a project is signed by an offline
> key then all further changes to the delegation require offline signing.

"Delegation" is another term that doesn't make immediate sense. I read
it as "Confirm ownership" sort of, Again, it's not that I can't work
out what the term means, but I don't get an immediate sense of the
implications. Here, for example, it's not immediately clear whether
delegation changes would be common or rare, or whether having them
happen quickly would be important. (For example, if you're not
available for a pip release, and we never bothered sharing the keys
because it's easier just for you to have them, would we need a
delegation change to do an emergency release?)

Again, this isn't something that it's important to clarify for me here
and now, but I would like to see the PEP updated to clarify this sort
of issue in terms that are accessible to the layman.

> In addition, this does not mean (I believe! we should verify this) that the
> owner of a project cannot further delegate to other people without delay, since
> they'll be able to sign that delegation with their own keys and won't require
> intervention from PyPI.

See above - implies to me that if the "owner" (in terms of keys rather
than project structure) is unavailable, other project members may not
be able to upload (rather than as now, when they can upload with the
project's PyPI account password and/or standard password recovery
processes to the project email address).

> So really it looks like this (abbreviated, not exactly, etc):
>
> root (offline)
> |- PyPI Admins (offline)
>    |- "Unclamined" (online)
>       |- New Project that hasn't yet been signed for by PyPI Admins
>          (offline, owned by author)
>    |- Existing Project that has already been signed for by PyPI Admins
>       (offline, owned by author)

I'm not keen on the term "unclaimed". All projects on PyPI right now
are "unclaimed" by that usage, which is emphatically not what
"unclaimed" means intuitively to me. Surely "pip" is "claimed" by the
PyPA? Maybe "unverified" works as a term (as in, verifying your
account when signing up to a forum). I get the idea that unclaimed
implies there's a risk, and sure there is, but this smacks of using a
loaded term to rub people's noses in the fact that what they've been
happily using for years "isn't safe". This happens a lot with security
debates, and IMO actively discourages people from buying into the
changes.

It would be useful to have *some* document (or part thereof - maybe an
overview section in the PEP) structured as an objective cost/benefit
proposal:

1. The current PyPI infrastructure has the following risks. We assess
the chance that they might occur as X.
2. The impact on the reader, as an individual, of a compromise, would
be the following.
3. The cost to the reader, under the proposed solution, of avoiding
the risks, is the following.

There are probably at least two classes of reader involved - project
authors and project users. If in either case one class of user has to
bear some cost on behalf of the other, then that should be called out.

I believe that I (and any other readers of the proposals) should be
able to sensibly assess the cost/benefit tradeoffs on my own, given
the above information. My experience and judgement may not be typical,
so my opinion should be taken in the context of others, but that
doesn't mean I'm wrong, or that my views are inappropriate for me. For
example, in 20 years of extensively using software off the internet, I
have never once downloaded a piece of software that wasn't what it
claimed, and I expected it, to be. So the discussion of compromising
PyPI packages seems like an amazingly low risk to me[1].

> The periodic signing process by the PyPI admins just moves a new project from
> being signed for by the "Unclaimed" online key to being signed for by our
> offline keys. This process is basically invisible to everyone involved.

It's as visible to end users as the significance of describing
something as "unclaimed". If nobody cared a project was "unclaimed"
then it would be invisible. Otherwise, less so. Hence my preference
for a less emotive term.

> Does that make sense?

Yes it does - many thanks for the explanation.

Paul

[1] That's just an example, and it would be off-topic to debate the
various other things that overall contribute to why and to what level
I'm currently comfortable using PyPI. And I'm not running a business
that would fail if PyPI were compromised. So please just take this as
a small data point, nothing more.