On Jan 2, 2015, at 3:21 AM, Nick Coghlan firstname.lastname@example.org wrote:
On 2 January 2015 at 16:38, Donald Stufft <email@example.com mailto:firstname.lastname@example.org> wrote:
On Jan 2, 2015, at 1:33 AM, Nick Coghlan <email@example.com mailto:firstname.lastname@example.org> wrote:
That's the part I meant - the signing of developer keys to delegate trust to them without needing to trust the integrity of the online PyPI service.
Hence the idea of instead keeping PyPI as an entirely online service (without any offline delegation of authority), and suggesting that developers keep their own separately signed metadata, which can then be compared against the PyPI published metadata (both by the developers themselves and by third parties). Discrepancies becoming a trigger for further investigation, which may include suspending the PyPI service if the the discrepancy is reported by an individual or organisation that the PyPI administrators trust.
I’m confused what you mean by “without needing to the trust the integrity of the online PyPI service”.
Developer keys get signed by offline keys controlled by I’m guessing either myself or Richard or both. The only time we’re depending on the integrity of the machine that runs PyPI and not on an offline key possessed by someone is during the window of time when a new project has been created (the project itself, not a release of a project) and the next time the delegations get signed by the offline keys.
Yes, as I said, that's the part I mean. To avoid trusting the integrity of the online PyPI service, while still using PyPI as a trust root for the purpose of software installation, we would need to define a system whereby:
We already have a hard security problem to solve (running PyPI), so adding a second hard security problem (running what would in effect be a CA) doesn't seem like a good approach to risk mitigation to me.
My proposal is that we instead avoid the hard problem of running a CA entirely by advising developers to monitor PyPI's integrity by ensuring that what PyPI is publishing matches what they released. That is, we split the end-to-end data integrity validation problem in two and solve each part separately:
The metadata validation potentially wouldn't even need to use TUF - developers could simply upload the expected hash of the artifacts they published, and the metadata validation service would check that the signed artifacts from PyPI match those hashes. The core of the idea is simply that there be a separate service (or services) which PyPI can't update, but developers uploading packages can.
By focusing on detection & recovery, rather than prevention, we can drastically reduce the complexity of the problem to be solved, while still mitigating the major risks we care about. The potential attacks that worry me are the ones that result in silent substitution of artifacts - when it comes to denial of service attacks, there's little reason to mess about with inducing metadata validation failures when there are already far simpler options available.
Redistributors may decide to take advantage of the developer metadata validation support to do our own verification of source downloads, but I don't believe upstream needs to worry about that too much - if developers have the means to verify automatically that what PyPI is currently publishing matches what they released, then the redistributor side of things should take care of itself.
Another way of viewing the problem is that instead of thinking of the scope of PEP 480 as PyPI delegating trust to developers, we can instead think of it as developers delegating trust to PyPI. PyPI then becomes a choke point in a network graph, rather than the root of a tree. My core idea stays the same regardless of how you look at it though: we don't try to solve the problem of letting end users establish in a single step that what they downloaded matches what the developer published. Instead, we aim to provide answers to the questions:
There's no fundamental requirement that those two questions be answered by the same security system - we have the option of splitting them, and I'm starting to think that the overall UX will be better if we do.
-- Nick Coghlan | email@example.com mailto:firstname.lastname@example.org | Brisbane, Australia
Oh I see. I was just misreading what you meant by “without trusting the integrity of the online PyPI service”, I thought you meant it in a post PEP 480 world, you meant it in a pre (or without) PEP 480 world.
So onto the actual thing that you’ve proposed!
I have concerns about the actual feasibility of doing such a thing, some of which are similar to my concerns with doing non-mandatory PEP 480.
I’d note that PEP 480 and your proposal aren’t really mutually exclusive so there’s not really harm in trying yours and if it fails falling back to something like PEP 480 other than end user confusion if that gets shut down and the cost of actually developing/setting up that solution.
Overall I’m +1 on things that enable better detection of a compromise but I’m probably -0.5 or so on your specific proposal as I think that expecting developers to upload verification data to “verification servers” is just pushing work onto other people just so we don’t have to do it.
I also think your two questions are not exactly right, because all that means is that it becomes harder to attack everyone via a PyPI compromise, however it’s still trivial to attack specific people if you’ve compromised PyPI or the CDN since you can selectively serve maliciously signed packages depending on who is requesting them. To this end I don’t think a solution that pip doesn’t implement is actually going to prevent anything but very dumb attacks by an attacker who has already compromised the PyPI machines.
I think another issue here is that we’re effectively doing something similar to TLS except instead of domain names we have project names and that although there are a lot of people who really hate the CA system nobody has yet come up with an effective means of actually replacing it without regressing into worse security. The saving grace here is that we operate at a much smaller scale (one “DNS” root, one trust root, ~53k unique names vs… more than I feel like account) so it’s possible that solutions which don’t scale at TLS scale might scale at PyPI scale.
Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA