On 2 January 2015 at 16:38, Donald Stufft <donald@stufft.io> wrote:

On Jan 2, 2015, at 1:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

That's the part I meant - the signing of developer keys to delegate trust to them without needing to trust the integrity of the online PyPI service.

Hence the idea of instead keeping PyPI as an entirely online service (without any offline delegation of authority), and suggesting that developers keep their *own* separately signed metadata, which can then be compared against the PyPI published metadata (both by the developers themselves and by third parties). Discrepancies becoming a trigger for further investigation, which may include suspending the PyPI service if the the discrepancy is reported by an individual or organisation that the PyPI administrators trust.

I’m confused what you mean by “without needing to the trust the integrity of the online PyPI service”.

Developer keys get signed by offline keys controlled by I’m guessing either myself or Richard or both. The only time we’re depending on the integrity of the machine that runs PyPI and not on an offline key possessed by someone is during the window of time when a new project has been created (the project itself, not a release of a project) and the next time the delegations get signed by the offline keys.

Yes, as I said, that's the part I mean. To avoid trusting the integrity of the online PyPI service, while still using PyPI as a trust root for the purpose of software installation, we would need to define a system whereby:

1. The PyPI administrators have a set of offline keys
2. Developer are able to supply keys to the PyPI administrators for trust delegation
3. This system has sufficiently low barriers to entry that developers are actually willing to use it
4. This system is compatible with a PyPI run build service

We already have a hard security problem to solve (running PyPI), so adding a *second* hard security problem (running what would in effect be a CA) doesn't seem like a good approach to risk mitigation to me.

My proposal is that we instead avoid the hard problem of running a CA entirely by advising *developers* to monitor PyPI's integrity by ensuring that what PyPI is publishing matches what they released. That is, we split the end-to-end data integrity validation problem in two and solve each part separately:

* use PEP 458 to cover the PyPI -> end user link, with the end users treating PyPI as a trusted authority. End users will be able to detect tampering with the link between them and PyPI, but if the online PyPI service gets compromised, *end users won't detect it*.
* use a separate metadata validation process to check that PyPI is publishing the right thing, covering both the developer -> PyPI link *and* the integrity of the PyPI service itself.

The metadata validation potentially wouldn't even need to use TUF - developers could simply upload the expected hash of the artifacts they published, and the metadata validation service would check that the signed artifacts from PyPI match those hashes. The core of the idea is simply that there be a separate service (or services) which PyPI can't update, but developers uploading packages *can*.

By focusing on detection & recovery, rather than prevention, we can drastically reduce the complexity of the problem to be solved, while still mitigating the major risks we care about. The potential attacks that worry me are the ones that result in silent substitution of artifacts - when it comes to denial of service attacks, there's little reason to mess about with inducing metadata validation failures when there are already far simpler options available.

Redistributors may decide to take advantage of the developer metadata validation support to do our own verification of source downloads, but I don't believe upstream needs to worry about that too much - if developers have the means to verify automatically that what PyPI is currently publishing matches what they released, then the redistributor side of things should take care of itself.

Another way of viewing the problem is that instead of thinking of the scope of PEP 480 as PyPI delegating trust to developers, we can instead think of it as developers delegating trust to PyPI. PyPI then becomes a choke point in a network graph, rather than the root of a tree. My core idea stays the same regardless of how you look at it though: we *don't* try to solve the problem of letting end users establish in a single step that what they downloaded matches what the developer published. Instead, we aim to provide answers to the questions:

* Did I just download what PyPI is currently publishing?
* Is PyPI currently publishing what the developer of <project> released?

There's no fundamental requirement that those two questions be answered by the *same* security system - we have the option of splitting them, and I'm starting to think that the overall UX will be better if we do.


Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia