[Distutils] Surviving a Compromise of PyPI - PEP 458 and 480

Vladimir Diaz vladimir.v.diaz at gmail.com
Thu Jan 1 17:25:23 CET 2015


On Wed, Dec 31, 2014 at 2:51 PM, Donald Stufft <donald at stufft.io> wrote:

>
> On Dec 31, 2014, at 11:08 AM, Vladimir Diaz <vladimir.v.diaz at gmail.com>
> wrote:
>
>
>
>
>
> On Wed, Dec 31, 2014 at 2:26 AM, Donald Stufft <donald at stufft.io> wrote:
>
>>
>> On Dec 10, 2014, at 10:16 PM, Vladimir Diaz <vladimir.v.diaz at gmail.com>
>> wrote:
>>
>> Hello everyone,
>>
>> I am a research programmer at the NYU School of Engineering.  My
>> colleagues (Trishank Kuppusamy and Justin Cappos) and I are requesting
>> community feedback on our proposal, "Surviving a Compromise of PyPI."  The
>> two-stage proposal can be reviewed online at:
>>
>> PEP 458
>> http://legacy.python.org/dev/peps/pep-0458/
>>
>> PEP 480
>> http://legacy.python.org/dev/peps/pep-0480/
>>
>>
>> Summary of the Proposal:
>>
>> "Surviving a Compromise of PyPI" proposes how the Python Package Index
>> (PyPI) can be amended to better protect end users from altered or malicious
>> packages, and to minimize the extent of PyPI compromises against affected
>> users.  The proposed integration allows package managers such as pip to be
>> more secure against various types of security attacks on PyPI and defend
>> end users from attackers responding to package requests. Specifically,
>> these PEPs describe how PyPI processes should be adapted to generate and
>> incorporate repository metadata, which are signed text files that describe
>> the packages and metadata available on PyPI.  Package managers request
>> (along with the packages) the metadata on PyPI to verify the authenticity
>> of packages before they are installed.  The changes to PyPI and tools will
>> be minimal by leveraging a library, The Update Framework
>> <https://github.com/theupdateframework/tuf>, that generates and
>> transparently validates the relevant metadata.
>>
>> The first stage of the proposal (PEP 458
>> <http://legacy.python.org/dev/peps/pep-0458/>) uses a basic security
>> model that supports verification of PyPI packages signed with cryptographic
>> keys stored on PyPI, requires no action from developers and end users, and
>> protects against malicious CDNs and public mirrors. To support continuous
>> delivery of uploaded packages, PyPI administrators sign for uploaded
>> packages with an online key stored on PyPI infrastructure. This level of
>> security prevents packages from being accidentally or deliberately tampered
>> with by a mirror or a CDN because the mirror or CDN will not have any of
>> the keys required to sign for projects.
>>
>> The second stage of the proposal (PEP 480
>> <http://legacy.python.org/dev/peps/pep-0480/>) is an extension to the
>> basic security model (discussed in PEP 458) that supports end-to-end
>> verification of signed packages. End-to-end signing allows both PyPI and
>> developers to sign for the packages that are downloaded by end users.  If
>> the PyPI infrastructure were to be compromised, attackers would be unable
>> to serve malicious versions of these packages without access to the
>> project's developer key.  As in PEP 458, no additional action is required
>> by end users.  However, PyPI administrators will need to periodically
>> (perhaps every few months) sign metadata with an offline key.  PEP 480 also
>> proposes an easy-to-use key management solution for developers, how to
>> interface with a potential build farm on PyPI infrastructure, and discusses
>> the security benefits of end-to-end signing.  The second stage of the
>> proposal simultaneously supports real-time project registration and
>> developer signatures, and when configured to maximize security on PyPI,
>> less than 1% of end users will be at risk even if an attacker controls PyPI
>> and goes undetected for a month.
>>
>> We thank Nick Coghlan and Donald Stufft for their valuable contributions,
>> and Giovanni Bajo and Anatoly Techtonik for their feedback.
>>
>>
>> I’ve just finished (re)reading the white paper, PEP 450, PEP 480, and
>> some of the supporting documentation on the TUF website.
>>
>
> Thanks!
>
>
>>
>> I’m confused about what exactly is contained within the TUF metadata and
>> who signs what in a PEP 480 world.
>>
>
> The following illustration shows what is contained within TUF metadata
> (JSON files):
>
> https://github.com/vladimir-v-diaz/pep-on-pypi-with-tuf/raw/master/pep-0458/figure4.pdf
> Note: In this illustration, the "snapshot" and "targets" roles are renamed
> "release" and "projects", respectively.
>
> If you're interested in what exactly is contained in these JSON files,
> here is example metadata:
>
> https://github.com/theupdateframework/tuf/tree/develop/examples/repository/metadata
>
> In a PEP 480 world, project developers sign a single JSON file. For
> example, developer(s) for the "Request" project sign their assigned JSON
> file named "/targets/claimed/Requests.json".  Specifically, a signature is
> generated of the "signed" entry
> <https://github.com/theupdateframework/tuf/blob/develop/examples/repository/metadata/targets.json#L9-L49>
>  of the dictionary.  Once the signature is generated, it is added to the
> "signatures" entry
> <https://github.com/theupdateframework/tuf/blob/develop/examples/repository/metadata/targets.json#L2-L7>
>  of the JSON file.
>
> In figure 1 of PEP 480, PyPI signs for every metadata except those listed
> under the "roles signed by developer keys" label:
> https://github.com/vladimir-v-diaz/pep-maximum-security-model/blob/master/pep-0480/figure1.png
>
>
> Ok, so authors never actually directly sign the package files themselves,
> they sign for a document that contains a hash of the files they upload?
>

Yes, the proposal does *not* generate a signature of just the package
file.  The hash and size of the package file are included in the document /
metadata that is signed.  The metadata also includes the relative file path
of the package, identifies the hashing algorithm used, when the metadata is
set to expire, the metadata's version number, etc.


>
>
>
>> Currently when you do something like ``pip install FooBar``, pip fetches
>> /simple/FooBar/ to look for potential installation candidates, and when it
>> finds one it downloads it and installs it. This all all “signed” by online
>> keys via TLS.
>>
>> 1. In a TUF world, would pip still fetch /simple/FooBar/ to discover
>> things to install or would it fetch some TUF metadata to find things to
>> install?
>>
>
> In the integration/demo we did with pip, we treated each /simple/ html
> file as a target (listed the hash and file size of these html index pages
> in TUF metadata).  That is, pip still fetched /simple/FooBar/ to discover
> distributions to install, but we verified the html files *and*
> distributions against TUF metadata.  In PEP 458, we state that "/simple" is
> also listed in TUF metadata:
> http://legacy.python.org/dev/peps/pep-0458/#pypi-and-tuf-metadata (last
> paragraph just before the diagram).
>
> Another option is to avoid crawling/listing the simple index pages and
> just search TUF metadata for distributions, but this approach will require
> design changes to pip.  We went with the approach (treat the index pages as
> targets) that required minimal changes to pip.
>
>
> So I’m personally perfectly happy to make more than minimal changes to pip
> as I want to get this right rather than just bolt something onto the side.
>

I like this route very much, assuming we have the liberty to improve the
"API" and maintain a clean design / integration; flexibility is a good
thing in this regard.  At the time, we were also unsure what we could
change.  The integration demo effectively kept a list of available
distributions in two locations, the simple html pages (e.g.,
https://pypi.python.org/simple/requests/, which pip crawled & could be
modified by developers through the API) and the JSON metadata.  This
approach can certainly be simplified / improved IMO.


>
>
>
>> 2. If it’s fetching /simple/FooBar/ is that secured by TUF?
>>
>
> Yes, see my response to (1).
>
> 3. If it’s secured by TUF who signs the TUF metadata that talks about
>> /simple/FooBar/ in PEP 480 the author or PyPI?
>>
>
> PEP 480 authors sign for both their project's index page and
> distribution(s) (as indicated in the JSON file):
>
> "A claimed or recently-claimed project will need to upload in its
> transaction to PyPI not just targets (a simple index as well as
> distributions) but also TUF metadata. The project MAY do so by uploading a
> ZIP file containing two directories, /metadata/ (containing delegated
> targets metadata files) and /targets/ (containing targets such as the
> project simple index and distributions that are signed by the delegated
> targets metadata)."
>
> See the second paragraph of
> http://legacy.python.org/dev/peps/pep-0480/#snapshot-process.
>
>
> So here is my problem. I’m completely on board with the developer signing
> for the distribution files. I think that makes total sense. However I worry
> that requiring the developer to sign for what is essentially the
> “installer” API (aka how pip discovers things to install) is going to put
> us in a situation where we cannot evolve the API easily. If we modified
> this PEP so that an online key signed for /simple/ what security properties
> would we lose?
>

> It *appears* to me that the problem then would be that a compromise of
> PyPI can present whatever information they want to pip as to what is
> available for pip to download and install. This would mean freeze attacks,
> mix and match attacks. It would also mean that they could, in a future
> world where pip can use metadata on PyPI to do dependency resolution, tell
> pip that it needs to download a valid but malicious project as a dependency
> of a popular project like virtualenv.
>

I think is a valid problem, as you have noted. It is probably better to
avoid a simple API (signed with an online key) that can be easily modified
in the event of a compromise.

By "evolve" the API, you mean modify the API such as disallowing developers
to re-upload distributions and thus change hashes / file sizes listed in
metadata, hide available distributions, etc.? In other words, contradict
what is stated in metadata in an "on the fly" manner?


> However I don’t think they’d be able to actually cause pip to install a
> malicious copy of a good project and I believe that we can protect against
> an attacker who poses that key from tricking pip into installing a
> malicious but valid project as a fake dependency by having pip only use the
> theoretical future PyPI metadata that lists dependencies as an optimization
> hint for what it should download and then once it’s actually downloaded a
> project like virtualenv (which has been validated to be from the real
> author) peek inside that file and ensure that the metadata inside that
> matches what PyPI told pip.
>

Assuming I have understood you correctly (I had to make certain
assumptions), I think this is a good observation and assessment.

My understanding:

1. Developers sign and upload PEP 480 metadata, which list the project's
distributions.
2. Developers do *not* sign the API part (/simple/ in this case).
3. A future / new dependency file format is used to handle dependency
resolution.  Developers also implicitly sign the new dependency file that
is included in distribution(s) uploaded to PyPI.
4. Client code (python or pip?) verifies dependencies, regardless of what
pip fetches from PyPI.

Now, the "optimization hint" part is to avoid having to open the
distribution archive to extract the dependency file, and then continue
downloading remaining dependencies?  In this manner, pip can just use the
project's "online" dependency information, that is also stored on PyPI, to
handle dependency resolution more quickly.  Once all needed distributions
have been downloaded, pip can consult the PEP 480 metadata *and* the
dependency file to ensure that no shenanigans have occurred.

Is my understanding correct?


> Is my assessment correct? Is keeping the “API” under control of PyPI a
> reasonable thing to do while keeping the actual distribution files
> themselves under control of the distribution authors? The reason this
> worries me is that unlikely a Linux distribution or an application like
> Firefox or so we don’t have much of a relationship with the people who are
> uploading things to PyPI. So if we need to evolve the API we are not going
> to be able to compel our authors to go back and re-generate new signed
> metadata.
>
> An additional thing I see, it appears that all of the metadata in TUF has
> an expiration. While I think this makes complete sense for things signed by
> online keys and things signed by keys that the PyPI administrator and/or
> PSF board I don’t think this is something we can reasonably do for things
> signed by authors themselves. An author might publish something and then
> disappear and never come back and forcing them to resign at some point in
> the future isn’t something we’re reasonably able to do. Is there a plan for
> how to handle that?
>

Yes, we can add support for this behavior (i.e., the signed JSON can state
{expiration: None} and pip / client can ignore expirations in this case).
Another technique we may consider is moving these projects to an
"abandoned" role (signed with an offline key).  Although the PEP's do not
currently mention it, we can discuss how to handle abandoned projects here
if you wish, before deciding how to proceed.  Justin and Trishank (CC'd)
can also give feedback on this "abandoned" role.



>
>
> Let me know exactly what needs to change in the PEPs to make everything
> explained above clearer.  For example, in PEP 458 we provide a
> link/reference
> <https://www.python.org/dev/peps/pep-0458/#what-additional-repository-files-are-required-on-pypi>
>  (last paragraph of this subsection) to the Metadata document
> <https://github.com/theupdateframework/tuf/blob/develop/METADATA.md> indicating
> the content of the JSON files, but should the illustration
> <https://github.com/vladimir-v-diaz/pep-on-pypi-with-tuf/raw/master/pep-0458/figure4.pdf>I've
> included in this reply also be added?
>
>
> ---
> Donald Stufft
> PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150101/3528af47/attachment-0001.html>


More information about the Distutils-SIG mailing list