Mailman 3 Surviving a Compromise of PyPI - PEP 458 and 480 - Distutils-SIG

Surviving a Compromise of PyPI - PEP 458 and 480

Vladimir Diaz

10 Dec 2014 10 Dec '14

7:16 p.m.

Hello everyone, I am a research programmer at the NYU School of Engineering. My colleagues (Trishank Kuppusamy and Justin Cappos) and I are requesting community feedback on our proposal, "Surviving a Compromise of PyPI." The two-stage proposal can be reviewed online at: PEP 458 http://legacy.python.org/dev/peps/pep-0458/ PEP 480 http://legacy.python.org/dev/peps/pep-0480/ Summary of the Proposal: "Surviving a Compromise of PyPI" proposes how the Python Package Index (PyPI) can be amended to better protect end users from altered or malicious packages, and to minimize the extent of PyPI compromises against affected users. The proposed integration allows package managers such as pip to be more secure against various types of security attacks on PyPI and defend end users from attackers responding to package requests. Specifically, these PEPs describe how PyPI processes should be adapted to generate and incorporate repository metadata, which are signed text files that describe the packages and metadata available on PyPI. Package managers request (along with the packages) the metadata on PyPI to verify the authenticity of packages before they are installed. The changes to PyPI and tools will be minimal by leveraging a library, The Update Framework <https://github.com/theupdateframework/tuf>, that generates and transparently validates the relevant metadata. The first stage of the proposal (PEP 458 <http://legacy.python.org/dev/peps/pep-0458/>) uses a basic security model that supports verification of PyPI packages signed with cryptographic keys stored on PyPI, requires no action from developers and end users, and protects against malicious CDNs and public mirrors. To support continuous delivery of uploaded packages, PyPI administrators sign for uploaded packages with an online key stored on PyPI infrastructure. This level of security prevents packages from being accidentally or deliberately tampered with by a mirror or a CDN because the mirror or CDN will not have any of the keys required to sign for projects. The second stage of the proposal (PEP 480 <http://legacy.python.org/dev/peps/pep-0480/>) is an extension to the basic security model (discussed in PEP 458) that supports end-to-end verification of signed packages. End-to-end signing allows both PyPI and developers to sign for the packages that are downloaded by end users. If the PyPI infrastructure were to be compromised, attackers would be unable to serve malicious versions of these packages without access to the project's developer key. As in PEP 458, no additional action is required by end users. However, PyPI administrators will need to periodically (perhaps every few months) sign metadata with an offline key. PEP 480 also proposes an easy-to-use key management solution for developers, how to interface with a potential build farm on PyPI infrastructure, and discusses the security benefits of end-to-end signing. The second stage of the proposal simultaneously supports real-time project registration and developer signatures, and when configured to maximize security on PyPI, less than 1% of end users will be at risk even if an attacker controls PyPI and goes undetected for a month. We thank Nick Coghlan and Donald Stufft for their valuable contributions, and Giovanni Bajo and Anatoly Techtonik for their feedback. Thanks, PEP 458 & 480 authors.

Attachments:

attachment.html (text/html — 3.7 KB)

Show replies by date

Vladimir Diaz

22 Dec 22 Dec

7:46 a.m.

::ping:: Has anyone in the community gotten a chance to review PEP 458 and/or PEP 480? Community feedback would be appreciated. Thanks, Vlad On Wed, Dec 10, 2014 at 10:16 PM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

...

Hello everyone,

I am a research programmer at the NYU School of Engineering. My colleagues (Trishank Kuppusamy and Justin Cappos) and I are requesting community feedback on our proposal, "Surviving a Compromise of PyPI." The two-stage proposal can be reviewed online at:

PEP 458 http://legacy.python.org/dev/peps/pep-0458/

PEP 480 http://legacy.python.org/dev/peps/pep-0480/

Summary of the Proposal:

"Surviving a Compromise of PyPI" proposes how the Python Package Index (PyPI) can be amended to better protect end users from altered or malicious packages, and to minimize the extent of PyPI compromises against affected users. The proposed integration allows package managers such as pip to be more secure against various types of security attacks on PyPI and defend end users from attackers responding to package requests. Specifically, these PEPs describe how PyPI processes should be adapted to generate and incorporate repository metadata, which are signed text files that describe the packages and metadata available on PyPI. Package managers request (along with the packages) the metadata on PyPI to verify the authenticity of packages before they are installed. The changes to PyPI and tools will be minimal by leveraging a library, The Update Framework <https://github.com/theupdateframework/tuf>, that generates and transparently validates the relevant metadata.

The first stage of the proposal (PEP 458 <http://legacy.python.org/dev/peps/pep-0458/>) uses a basic security model that supports verification of PyPI packages signed with cryptographic keys stored on PyPI, requires no action from developers and end users, and protects against malicious CDNs and public mirrors. To support continuous delivery of uploaded packages, PyPI administrators sign for uploaded packages with an online key stored on PyPI infrastructure. This level of security prevents packages from being accidentally or deliberately tampered with by a mirror or a CDN because the mirror or CDN will not have any of the keys required to sign for projects.

The second stage of the proposal (PEP 480 <http://legacy.python.org/dev/peps/pep-0480/>) is an extension to the basic security model (discussed in PEP 458) that supports end-to-end verification of signed packages. End-to-end signing allows both PyPI and developers to sign for the packages that are downloaded by end users. If the PyPI infrastructure were to be compromised, attackers would be unable to serve malicious versions of these packages without access to the project's developer key. As in PEP 458, no additional action is required by end users. However, PyPI administrators will need to periodically (perhaps every few months) sign metadata with an offline key. PEP 480 also proposes an easy-to-use key management solution for developers, how to interface with a potential build farm on PyPI infrastructure, and discusses the security benefits of end-to-end signing. The second stage of the proposal simultaneously supports real-time project registration and developer signatures, and when configured to maximize security on PyPI, less than 1% of end users will be at risk even if an attacker controls PyPI and goes undetected for a month.

We thank Nick Coghlan and Donald Stufft for their valuable contributions, and Giovanni Bajo and Anatoly Techtonik for their feedback.

Thanks, PEP 458 & 480 authors.

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Nick Coghlan

8:30 a.m.

On 23 December 2014 at 01:46, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

...

::ping::

Has anyone in the community gotten a chance to review PEP 458 and/or PEP 480? Community feedback would be appreciated.

...

From my perspective, the split into two PEPs meant most of the areas I have doubts about have been moved to the end-to-end security model in PEP 480, leaving PEP 458 to cover the simpler task of securing the link from PyPI to

Sorry about that Vladimir - I think the main PyPA devs have been focused on getting PEP 440 implemented, and the associated setuptools, pip and virtualenv updates out the door, and now we're into the end of year holiday period for many of us. the end user in such a way that public mirrors of packages can be trusted to accurately reflect the content published by PyPI. The new appendix C raises some good questions regarding how we would like to deal with externally hosted packages. I personally agree with the PEP's recommendation to require TUF metadata generated with a developer key in that case, with a slight preference for publishing that metadata (including the corresponding security delegation) through PyPI. However, I'm open to practicality arguments suggesting that one of the other options may be more feasible for maintainers of externally hosted repositories. The root key management question is the other one that will be interesting, given the distributed nature of both PSF Infrastructure maintenance and pip/PyPI development. A partial root key compromise would effectively become a CVE against pip and CPython (and hence flowing on to Linux distributions and potentially other redistributors), with the severity depending on whether or not the signing threshold has been reached. I suspect before we sign off on that last part, we're going to need to get quite specific with exactly *who* will hold signing authority over those root keys (just as the CPython release PEPs specifically name the release managers for the source tarballs and the platform specific binaries, who then have signing authority for their respective artifacts) Regards, Nick.

...

Thanks, Vlad

On Wed, Dec 10, 2014 at 10:16 PM, Vladimir Diaz <vladimir.v.diaz@gmail.com

...
wrote:

...
Hello everyone,

I am a research programmer at the NYU School of Engineering. My colleagues (Trishank Kuppusamy and Justin Cappos) and I are requesting community feedback on our proposal, "Surviving a Compromise of PyPI." The two-stage proposal can be reviewed online at:

PEP 458 http://legacy.python.org/dev/peps/pep-0458/

PEP 480 http://legacy.python.org/dev/peps/pep-0480/

Summary of the Proposal:

"Surviving a Compromise of PyPI" proposes how the Python Package Index (PyPI) can be amended to better protect end users from altered or malicious packages, and to minimize the extent of PyPI compromises against affected users. The proposed integration allows package managers such as pip to be more secure against various types of security attacks on PyPI and defend end users from attackers responding to package requests. Specifically, these PEPs describe how PyPI processes should be adapted to generate and incorporate repository metadata, which are signed text files that describe the packages and metadata available on PyPI. Package managers request (along with the packages) the metadata on PyPI to verify the authenticity of packages before they are installed. The changes to PyPI and tools will be minimal by leveraging a library, The Update Framework <https://github.com/theupdateframework/tuf>, that generates and transparently validates the relevant metadata.

The first stage of the proposal (PEP 458 <http://legacy.python.org/dev/peps/pep-0458/>) uses a basic security model that supports verification of PyPI packages signed with cryptographic keys stored on PyPI, requires no action from developers and end users, and protects against malicious CDNs and public mirrors. To support continuous delivery of uploaded packages, PyPI administrators sign for uploaded packages with an online key stored on PyPI infrastructure. This level of security prevents packages from being accidentally or deliberately tampered with by a mirror or a CDN because the mirror or CDN will not have any of the keys required to sign for projects.

The second stage of the proposal (PEP 480 <http://legacy.python.org/dev/peps/pep-0480/>) is an extension to the basic security model (discussed in PEP 458) that supports end-to-end verification of signed packages. End-to-end signing allows both PyPI and developers to sign for the packages that are downloaded by end users. If the PyPI infrastructure were to be compromised, attackers would be unable to serve malicious versions of these packages without access to the project's developer key. As in PEP 458, no additional action is required by end users. However, PyPI administrators will need to periodically (perhaps every few months) sign metadata with an offline key. PEP 480 also proposes an easy-to-use key management solution for developers, how to interface with a potential build farm on PyPI infrastructure, and discusses the security benefits of end-to-end signing. The second stage of the proposal simultaneously supports real-time project registration and developer signatures, and when configured to maximize security on PyPI, less than 1% of end users will be at risk even if an attacker controls PyPI and goes undetected for a month.

We thank Nick Coghlan and Donald Stufft for their valuable contributions, and Giovanni Bajo and Anatoly Techtonik for their feedback.

Thanks, PEP 458 & 480 authors.

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Vladimir Diaz

10:15 a.m.

On Mon, Dec 22, 2014 at 11:30 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 23 December 2014 at 01:46, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

...
::ping::

Has anyone in the community gotten a chance to review PEP 458 and/or PEP 480? Community feedback would be appreciated.

Sorry about that Vladimir - I think the main PyPA devs have been focused on getting PEP 440 implemented, and the associated setuptools, pip and virtualenv updates out the door, and now we're into the end of year holiday period for many of us.

Considering that the previous drafts of PEP 458 generated quite a bit of feedback and questions, I was surprised by the lack of responses this time around (almost relieved :). Receiving feedback from the main PyP developers after the holiday period is definitely not a problem.

...

From my perspective, the split into two PEPs meant most of the areas I have doubts about have been moved to the end-to-end security model in PEP 480, leaving PEP 458 to cover the simpler task of securing the link from PyPI to the end user in such a way that public mirrors of packages can be trusted to accurately reflect the content published by PyPI.

I think splitting the proposal into two PEPs was the right decision. We hope working with Donald on the end-to-end security model (PEP 480), and feedback from the community will help to address any remaining questions. Excluding the end-to-end option from the revised version of PEP 458 also made room for an overview of the metadata and framework, which was requested by multiple members of the community.

...

The new appendix C raises some good questions regarding how we would like to deal with externally hosted packages. I personally agree with the PEP's recommendation to require TUF metadata generated with a developer key in that case, with a slight preference for publishing that metadata (including the corresponding security delegation) through PyPI. However, I'm open to practicality arguments suggesting that one of the other options may be more feasible for maintainers of externally hosted repositories.

The root key management question is the other one that will be interesting, given the distributed nature of both PSF Infrastructure maintenance and pip/PyPI development. A partial root key compromise would effectively become a CVE against pip and CPython (and hence flowing on to Linux distributions and potentially other redistributors), with the severity depending on whether or not the signing threshold has been reached.

I suspect before we sign off on that last part, we're going to need to get quite specific with exactly *who* will hold signing authority over those root keys (just as the CPython release PEPs specifically name the release managers for the source tarballs and the platform specific binaries, who then have signing authority for their respective artifacts)

We can decide in the coming weeks who should be explicitly named, and the number of individuals that should hold signing authority over the root keys. If reaching a reasonable threshold of root keys remains a problem, we can also assist with meeting this threshold. I will discuss this matter with the rest of the authors to make sure we are all on the same page wrt root key management. Regards, Nick.

...

Thanks, Vlad

On Wed, Dec 10, 2014 at 10:16 PM, Vladimir Diaz <vladimir.v.diaz@gmail.com

...
wrote:

...
Hello everyone,

I am a research programmer at the NYU School of Engineering. My colleagues (Trishank Kuppusamy and Justin Cappos) and I are requesting community feedback on our proposal, "Surviving a Compromise of PyPI." The two-stage proposal can be reviewed online at:

PEP 458 http://legacy.python.org/dev/peps/pep-0458/

PEP 480 http://legacy.python.org/dev/peps/pep-0480/

Summary of the Proposal:

"Surviving a Compromise of PyPI" proposes how the Python Package Index (PyPI) can be amended to better protect end users from altered or malicious packages, and to minimize the extent of PyPI compromises against affected users. The proposed integration allows package managers such as pip to be more secure against various types of security attacks on PyPI and defend end users from attackers responding to package requests. Specifically, these PEPs describe how PyPI processes should be adapted to generate and incorporate repository metadata, which are signed text files that describe the packages and metadata available on PyPI. Package managers request (along with the packages) the metadata on PyPI to verify the authenticity of packages before they are installed. The changes to PyPI and tools will be minimal by leveraging a library, The Update Framework <https://github.com/theupdateframework/tuf>, that generates and transparently validates the relevant metadata.

The first stage of the proposal (PEP 458 <http://legacy.python.org/dev/peps/pep-0458/>) uses a basic security model that supports verification of PyPI packages signed with cryptographic keys stored on PyPI, requires no action from developers and end users, and protects against malicious CDNs and public mirrors. To support continuous delivery of uploaded packages, PyPI administrators sign for uploaded packages with an online key stored on PyPI infrastructure. This level of security prevents packages from being accidentally or deliberately tampered with by a mirror or a CDN because the mirror or CDN will not have any of the keys required to sign for projects.

The second stage of the proposal (PEP 480 <http://legacy.python.org/dev/peps/pep-0480/>) is an extension to the basic security model (discussed in PEP 458) that supports end-to-end verification of signed packages. End-to-end signing allows both PyPI and developers to sign for the packages that are downloaded by end users. If the PyPI infrastructure were to be compromised, attackers would be unable to serve malicious versions of these packages without access to the project's developer key. As in PEP 458, no additional action is required by end users. However, PyPI administrators will need to periodically (perhaps every few months) sign metadata with an offline key. PEP 480 also proposes an easy-to-use key management solution for developers, how to interface with a potential build farm on PyPI infrastructure, and discusses the security benefits of end-to-end signing. The second stage of the proposal simultaneously supports real-time project registration and developer signatures, and when configured to maximize security on PyPI, less than 1% of end users will be at risk even if an attacker controls PyPI and goes undetected for a month.

We thank Nick Coghlan and Donald Stufft for their valuable contributions, and Giovanni Bajo and Anatoly Techtonik for their feedback.

Thanks, PEP 458 & 480 authors.

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

23 Dec 23 Dec

4:47 a.m.

...

On Dec 22, 2014, at 1:15 PM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

On Mon, Dec 22, 2014 at 11:30 AM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote: On 23 December 2014 at 01:46, Vladimir Diaz <vladimir.v.diaz@gmail.com <mailto:vladimir.v.diaz@gmail.com>> wrote: ::ping::

Has anyone in the community gotten a chance to review PEP 458 and/or PEP 480? Community feedback would be appreciated.

Sorry about that Vladimir - I think the main PyPA devs have been focused on getting PEP 440 implemented, and the associated setuptools, pip and virtualenv updates out the door, and now we're into the end of year holiday period for many of us.

Considering that the previous drafts of PEP 458 generated quite a bit of feedback and questions, I was surprised by the lack of responses this time around (almost relieved :). Receiving feedback from the main PyP developers after the holiday period is definitely not a problem.

This is why I haven’t looked at it yet. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

30 Dec 30 Dec

5:24 p.m.

On 23 December 2014 at 04:15, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

...

On Mon, Dec 22, 2014 at 11:30 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
From my perspective, the split into two PEPs meant most of the areas I have doubts about have been moved to the end-to-end security model in PEP 480, leaving PEP 458 to cover the simpler task of securing the link from PyPI to the end user in such a way that public mirrors of packages can be trusted to accurately reflect the content published by PyPI.

I think splitting the proposal into two PEPs was the right decision. We hope working with Donald on the end-to-end security model (PEP 480), and feedback from the community will help to address any remaining questions. Excluding the end-to-end option from the revised version of PEP 458 also made room for an overview of the metadata and framework, which was requested by multiple members of the community.

An off-list question from Richard made me realise we should likely retitle the two PEPs slightly. I'd suggest the following names: PEP 458: Surviving a compromise of the PyPI CDN PEP 480: Surviving a compromise of PyPI That encapsulates the difference between the threat model of the two PEPs in a way that the current titles don't quite convey (the reduced scope of PEP 458 in particular means that the current title is actually outright wrong - protecting against a compromise of PyPI itself is the scope that was moved to PEP 480). The reduced scope of PEP 458 also still protects against the compromise of read-only mirrors, but I don't think we need to try to capture that directly in the title. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

6:21 p.m.

...

On Dec 30, 2014, at 8:24 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 23 December 2014 at 04:15, Vladimir Diaz <vladimir.v.diaz@gmail.com <mailto:vladimir.v.diaz@gmail.com>> wrote: On Mon, Dec 22, 2014 at 11:30 AM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote: From my perspective, the split into two PEPs meant most of the areas I have doubts about have been moved to the end-to-end security model in PEP 480, leaving PEP 458 to cover the simpler task of securing the link from PyPI to the end user in such a way that public mirrors of packages can be trusted to accurately reflect the content published by PyPI.

I think splitting the proposal into two PEPs was the right decision. We hope working with Donald on the end-to-end security model (PEP 480), and feedback from the community will help to address any remaining questions. Excluding the end-to-end option from the revised version of PEP 458 also made room for an overview of the metadata and framework, which was requested by multiple members of the community.

An off-list question from Richard made me realise we should likely retitle the two PEPs slightly. I'd suggest the following names:

PEP 458: Surviving a compromise of the PyPI CDN

This isn’t exactly right either, because it won’t survive a compromise of the CDN for *uploading*, but it might be close enough not to matter. Perhaps better would be something about not relying on TLS or something.

...

PEP 480: Surviving a compromise of PyPI

That encapsulates the difference between the threat model of the two PEPs in a way that the current titles don't quite convey (the reduced scope of PEP 458 in particular means that the current title is actually outright wrong - protecting against a compromise of PyPI itself is the scope that was moved to PEP 480).

The reduced scope of PEP 458 also still protects against the compromise of read-only mirrors, but I don't think we need to try to capture that directly in the title.

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com <mailto:ncoghlan@gmail.com> | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Richard Jones

6:29 p.m.

Thanks for the clarification, guys. Donald, I'm not sure what you mean by "a compromise of the CDN for *uploading*". On Wed Dec 31 2014 at 1:21:18 PM Donald Stufft <donald@stufft.io> wrote:

...

On Dec 30, 2014, at 8:24 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 23 December 2014 at 04:15, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

...
On Mon, Dec 22, 2014 at 11:30 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
From my perspective, the split into two PEPs meant most of the areas I have doubts about have been moved to the end-to-end security model in PEP 480, leaving PEP 458 to cover the simpler task of securing the link from PyPI to the end user in such a way that public mirrors of packages can be trusted to accurately reflect the content published by PyPI.

I think splitting the proposal into two PEPs was the right decision. We hope working with Donald on the end-to-end security model (PEP 480), and feedback from the community will help to address any remaining questions. Excluding the end-to-end option from the revised version of PEP 458 also made room for an overview of the metadata and framework, which was requested by multiple members of the community.

An off-list question from Richard made me realise we should likely retitle the two PEPs slightly. I'd suggest the following names:

PEP 458: Surviving a compromise of the PyPI CDN

This isn’t exactly right either, because it won’t survive a compromise of the CDN for *uploading*, but it might be close enough not to matter. Perhaps better would be something about not relying on TLS or something.

PEP 480: Surviving a compromise of PyPI

That encapsulates the difference between the threat model of the two PEPs in a way that the current titles don't quite convey (the reduced scope of PEP 458 in particular means that the current title is actually outright wrong - protecting against a compromise of PyPI itself is the scope that was moved to PEP 480).

The reduced scope of PEP 458 also still protects against the compromise of read-only mirrors, but I don't think we need to try to capture that directly in the title.

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Donald Stufft

6:32 p.m.

...

On Dec 30, 2014, at 9:29 PM, Richard Jones <richard@python.org> wrote:

Thanks for the clarification, guys.

Donald, I'm not sure what you mean by "a compromise of the CDN for *uploading*”.

PyPI trusts the CDN to give it the correct bits, without a signature from the author that is being verified uploading just relies on TLS again. The other PEP should close that gap though I believe. Note: I have yet to read these PEPs so I’m just going by a casual glance of them. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

8:01 p.m.

On 31 December 2014 at 12:32, Donald Stufft <donald@stufft.io> wrote:

...

PyPI trusts the CDN to give it the correct bits, without a signature from the author that is being verified uploading just relies on TLS again. The other PEP should close that gap though I believe.

I'm actually not sure what going through the CDN is buying us on the upload side of things in the first place, given the main pay-off provided by a CDN is geographically distributed caching of unchanging data for faster downloads. So it seems to me that that particular vulnerability could potentially be fixed more simply by bypassing the CDN entirely for the upload case. That's simplicity in the *conceptual* sense, though - there may be architectural issues in the current implementation of PyPI and the related tools that make it harder in practice than it would be in theory. Either way, I agree that any kind of upload compromise based attack is also out of scope for PEP 458 - that's now entirely about ensuring that the bits delivered to end users are the same bits PyPI published. Making sure that the bits *PyPI* publishes are the same ones that the *developer* published is the domain of PEP 480. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Richard Jones

8:14 p.m.

Now that I think about it, I'm almost certain that Donald and I have had the "hey, what about an upload.pypi.python.org" conversation in the past, as a way around issues involving the CDN :) Still a good idea, in my opinion. Richard On Wed Dec 31 2014 at 3:01:53 PM Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 31 December 2014 at 12:32, Donald Stufft <donald@stufft.io> wrote:

...
PyPI trusts the CDN to give it the correct bits, without a signature from the author that is being verified uploading just relies on TLS again. The other PEP should close that gap though I believe.

I'm actually not sure what going through the CDN is buying us on the upload side of things in the first place, given the main pay-off provided by a CDN is geographically distributed caching of unchanging data for faster downloads.

So it seems to me that that particular vulnerability could potentially be fixed more simply by bypassing the CDN entirely for the upload case. That's simplicity in the *conceptual* sense, though - there may be architectural issues in the current implementation of PyPI and the related tools that make it harder in practice than it would be in theory.

Either way, I agree that any kind of upload compromise based attack is also out of scope for PEP 458 - that's now entirely about ensuring that the bits delivered to end users are the same bits PyPI published. Making sure that the bits *PyPI* publishes are the same ones that the *developer* published is the domain of PEP 480.

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Brett Cannon

24 Dec 24 Dec

10:18 a.m.

Vladimir Diaz wrote:

...

On Mon, Dec 22, 2014 at 11:30 AM, Nick Coghlan ncoghlan@gmail.com wrote:

...
On 23 December 2014 at 01:46, Vladimir Diaz vladimir.v.diaz@gmail.com wrote: ::ping:: Has anyone in the community gotten a chance to review PEP 458 and/or PEP 480? Community feedback would be appreciated. Sorry about that Vladimir - I think the main PyPA devs have been focused on getting PEP 440 implemented, and the associated setuptools, pip and virtualenv updates out the door, and now we're into the end of year holiday period for many of us. Considering that the previous drafts of PEP 458 generated quite a bit of feedback and questions, I was surprised by the lack of responses this time around (almost relieved :).

That's due to a difference in audiences. This mailing list is very much skewed towards packaging folks, while the Packaging category on discuss.python.org gets more people tangentially interested in packaging versus heavily focused on it, so the type of people you will engage with will vary in the two areas.

Donald Stufft

30 Dec 30 Dec

11:26 p.m.

...

On Dec 10, 2014, at 10:16 PM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

Hello everyone,

I am a research programmer at the NYU School of Engineering. My colleagues (Trishank Kuppusamy and Justin Cappos) and I are requesting community feedback on our proposal, "Surviving a Compromise of PyPI." The two-stage proposal can be reviewed online at:

PEP 458 http://legacy.python.org/dev/peps/pep-0458/ <http://legacy.python.org/dev/peps/pep-0458/>

PEP 480 http://legacy.python.org/dev/peps/pep-0480/ <http://legacy.python.org/dev/peps/pep-0480/>

Summary of the Proposal:

"Surviving a Compromise of PyPI" proposes how the Python Package Index (PyPI) can be amended to better protect end users from altered or malicious packages, and to minimize the extent of PyPI compromises against affected users. The proposed integration allows package managers such as pip to be more secure against various types of security attacks on PyPI and defend end users from attackers responding to package requests. Specifically, these PEPs describe how PyPI processes should be adapted to generate and incorporate repository metadata, which are signed text files that describe the packages and metadata available on PyPI. Package managers request (along with the packages) the metadata on PyPI to verify the authenticity of packages before they are installed. The changes to PyPI and tools will be minimal by leveraging a library, The Update Framework <https://github.com/theupdateframework/tuf>, that generates and transparently validates the relevant metadata.

The first stage of the proposal (PEP 458 <http://legacy.python.org/dev/peps/pep-0458/>) uses a basic security model that supports verification of PyPI packages signed with cryptographic keys stored on PyPI, requires no action from developers and end users, and protects against malicious CDNs and public mirrors. To support continuous delivery of uploaded packages, PyPI administrators sign for uploaded packages with an online key stored on PyPI infrastructure. This level of security prevents packages from being accidentally or deliberately tampered with by a mirror or a CDN because the mirror or CDN will not have any of the keys required to sign for projects.

The second stage of the proposal (PEP 480 <http://legacy.python.org/dev/peps/pep-0480/>) is an extension to the basic security model (discussed in PEP 458) that supports end-to-end verification of signed packages. End-to-end signing allows both PyPI and developers to sign for the packages that are downloaded by end users. If the PyPI infrastructure were to be compromised, attackers would be unable to serve malicious versions of these packages without access to the project's developer key. As in PEP 458, no additional action is required by end users. However, PyPI administrators will need to periodically (perhaps every few months) sign metadata with an offline key. PEP 480 also proposes an easy-to-use key management solution for developers, how to interface with a potential build farm on PyPI infrastructure, and discusses the security benefits of end-to-end signing. The second stage of the proposal simultaneously supports real-time project registration and developer signatures, and when configured to maximize security on PyPI, less than 1% of end users will be at risk even if an attacker controls PyPI and goes undetected for a month.

We thank Nick Coghlan and Donald Stufft for their valuable contributions, and Giovanni Bajo and Anatoly Techtonik for their feedback.

I’ve just finished (re)reading the white paper, PEP 450, PEP 480, and some of the supporting documentation on the TUF website. I’m confused about what exactly is contained within the TUF metadata and who signs what in a PEP 480 world. Currently when you do something like ``pip install FooBar``, pip fetches /simple/FooBar/ to look for potential installation candidates, and when it finds one it downloads it and installs it. This all all “signed” by online keys via TLS. 1. In a TUF world, would pip still fetch /simple/FooBar/ to discover things to install or would it fetch some TUF metadata to find things to install? 2. If it’s fetching /simple/FooBar/ is that secured by TUF? 3. If it’s secured by TUF who signs the TUF metadata that talks about /simple/FooBar/ in PEP 480 the author or PyPI? --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Vladimir Diaz

31 Dec 31 Dec

8:08 a.m.

On Wed, Dec 31, 2014 at 2:26 AM, Donald Stufft <donald@stufft.io> wrote:

...

On Dec 10, 2014, at 10:16 PM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

Hello everyone,

I am a research programmer at the NYU School of Engineering. My colleagues (Trishank Kuppusamy and Justin Cappos) and I are requesting community feedback on our proposal, "Surviving a Compromise of PyPI." The two-stage proposal can be reviewed online at:

PEP 458 http://legacy.python.org/dev/peps/pep-0458/

PEP 480 http://legacy.python.org/dev/peps/pep-0480/

Summary of the Proposal:

"Surviving a Compromise of PyPI" proposes how the Python Package Index (PyPI) can be amended to better protect end users from altered or malicious packages, and to minimize the extent of PyPI compromises against affected users. The proposed integration allows package managers such as pip to be more secure against various types of security attacks on PyPI and defend end users from attackers responding to package requests. Specifically, these PEPs describe how PyPI processes should be adapted to generate and incorporate repository metadata, which are signed text files that describe the packages and metadata available on PyPI. Package managers request (along with the packages) the metadata on PyPI to verify the authenticity of packages before they are installed. The changes to PyPI and tools will be minimal by leveraging a library, The Update Framework <https://github.com/theupdateframework/tuf>, that generates and transparently validates the relevant metadata.

The first stage of the proposal (PEP 458 <http://legacy.python.org/dev/peps/pep-0458/>) uses a basic security model that supports verification of PyPI packages signed with cryptographic keys stored on PyPI, requires no action from developers and end users, and protects against malicious CDNs and public mirrors. To support continuous delivery of uploaded packages, PyPI administrators sign for uploaded packages with an online key stored on PyPI infrastructure. This level of security prevents packages from being accidentally or deliberately tampered with by a mirror or a CDN because the mirror or CDN will not have any of the keys required to sign for projects.

The second stage of the proposal (PEP 480 <http://legacy.python.org/dev/peps/pep-0480/>) is an extension to the basic security model (discussed in PEP 458) that supports end-to-end verification of signed packages. End-to-end signing allows both PyPI and developers to sign for the packages that are downloaded by end users. If the PyPI infrastructure were to be compromised, attackers would be unable to serve malicious versions of these packages without access to the project's developer key. As in PEP 458, no additional action is required by end users. However, PyPI administrators will need to periodically (perhaps every few months) sign metadata with an offline key. PEP 480 also proposes an easy-to-use key management solution for developers, how to interface with a potential build farm on PyPI infrastructure, and discusses the security benefits of end-to-end signing. The second stage of the proposal simultaneously supports real-time project registration and developer signatures, and when configured to maximize security on PyPI, less than 1% of end users will be at risk even if an attacker controls PyPI and goes undetected for a month.

We thank Nick Coghlan and Donald Stufft for their valuable contributions, and Giovanni Bajo and Anatoly Techtonik for their feedback.

I’ve just finished (re)reading the white paper, PEP 450, PEP 480, and some of the supporting documentation on the TUF website.

Thanks!

...

I’m confused about what exactly is contained within the TUF metadata and who signs what in a PEP 480 world.

The following illustration shows what is contained within TUF metadata (JSON files): https://github.com/vladimir-v-diaz/pep-on-pypi-with-tuf/raw/master/pep-0458/... Note: In this illustration, the "snapshot" and "targets" roles are renamed "release" and "projects", respectively. If you're interested in what exactly is contained in these JSON files, here is example metadata: https://github.com/theupdateframework/tuf/tree/develop/examples/repository/m... In a PEP 480 world, project developers sign a single JSON file. For example, developer(s) for the "Request" project sign their assigned JSON file named "/targets/claimed/Requests.json". Specifically, a signature is generated of the "signed" entry <https://github.com/theupdateframework/tuf/blob/develop/examples/repository/m...> of the dictionary. Once the signature is generated, it is added to the "signatures" entry <https://github.com/theupdateframework/tuf/blob/develop/examples/repository/m...> of the JSON file. In figure 1 of PEP 480, PyPI signs for every metadata except those listed under the "roles signed by developer keys" label: https://github.com/vladimir-v-diaz/pep-maximum-security-model/blob/master/pe...

...

Currently when you do something like ``pip install FooBar``, pip fetches /simple/FooBar/ to look for potential installation candidates, and when it finds one it downloads it and installs it. This all all “signed” by online keys via TLS.

1. In a TUF world, would pip still fetch /simple/FooBar/ to discover things to install or would it fetch some TUF metadata to find things to install?

In the integration/demo we did with pip, we treated each /simple/ html file as a target (listed the hash and file size of these html index pages in TUF metadata). That is, pip still fetched /simple/FooBar/ to discover distributions to install, but we verified the html files *and* distributions against TUF metadata. In PEP 458, we state that "/simple" is also listed in TUF metadata: http://legacy.python.org/dev/peps/pep-0458/#pypi-and-tuf-metadata (last paragraph just before the diagram). Another option is to avoid crawling/listing the simple index pages and just search TUF metadata for distributions, but this approach will require design changes to pip. We went with the approach (treat the index pages as targets) that required minimal changes to pip.

...

2. If it’s fetching /simple/FooBar/ is that secured by TUF?

Yes, see my response to (1). 3. If it’s secured by TUF who signs the TUF metadata that talks about

...

/simple/FooBar/ in PEP 480 the author or PyPI?

PEP 480 authors sign for both their project's index page and distribution(s) (as indicated in the JSON file): "A claimed or recently-claimed project will need to upload in its transaction to PyPI not just targets (a simple index as well as distributions) but also TUF metadata. The project MAY do so by uploading a ZIP file containing two directories, /metadata/ (containing delegated targets metadata files) and /targets/ (containing targets such as the project simple index and distributions that are signed by the delegated targets metadata)." See the second paragraph of http://legacy.python.org/dev/peps/pep-0480/#snapshot-process. Let me know exactly what needs to change in the PEPs to make everything explained above clearer. For example, in PEP 458 we provide a link/reference <https://www.python.org/dev/peps/pep-0458/#what-additional-repository-files-a...> (last paragraph of this subsection) to the Metadata document <https://github.com/theupdateframework/tuf/blob/develop/METADATA.md> indicating the content of the JSON files, but should the illustration <https://github.com/vladimir-v-diaz/pep-on-pypi-with-tuf/raw/master/pep-0458/...> I've included in this reply also be added?

...

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Paul Moore

8:54 a.m.

On 31 December 2014 at 16:08, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

...

Let me know exactly what needs to change in the PEPs to make everything explained above clearer. For example, in PEP 458 we provide a link/reference (last paragraph of this subsection) to the Metadata document indicating the content of the JSON files, but should the illustration I've included in this reply also be added?

I don't know how generally useful this would be, and I can't even promise I've got any useful comments to make, but I find the proposals too full of concepts I don't really follow (as someone who isn't a PyPI admin or a security specialist) to be able to get much from them. Is there anywhere a document that simply explains, from the point of view of a package author, what I would need to do that is different from right now, in order to benefit from the proposal? (I assume the benefits are "your users can be sure that they get the files you uploaded, without tampering", and that's sufficient explanation of the benefit side from my perspective). For example, you say "PEP 480 authors sign for both their project's index page and distribution(s)". Does that mean I need to add something to the command line when I do "setup.py upload"? Can I still set up an automated build process or will it now need manual entry of some sort of passphrase in order to work? I appreciate that the target audience for these PEPs is really PyPI admins, at the moment, so maybe it's not the right time to look at them from a project author perspective - if so, then feel free to ignore these points for now :-) Thanks, Paul

Nick Coghlan

9:42 a.m.

On 1 January 2015 at 02:54, Paul Moore <p.f.moore@gmail.com> wrote:

...

I appreciate that the target audience for these PEPs is really PyPI admins, at the moment, so maybe it's not the right time to look at them from a project author perspective - if so, then feel free to ignore these points for now :-)

I think capturing the UX consequences is important, especially as that's actually the key driver behind the split into two PEPs. In theory, the UX consequences for PyPI hosted projects under PEP 458 should be close to nil - pip should just silently update to using the additional content validation mechanism behind the scenes, and thus become capable of detecting additional forms of illicit tampering that can't be detected through the existing use of transport layer security. Projects using external hosting *will* see a UX change under PEP 458, and that's something that will need to be figured out before the PEP can be accepted. PEP 480 is far more challenging from a UX perspective, as it brings in the notion of developer managed keys and the associated signed uploads. Key management *is* a pain, especially in distributed developer communities. Trust management in particular is *not* an easy problem to solve - hence why we still rely on the CA system for TLS (despite its many known attack vectors), and why we see the complicated situation around SecureBoot, where community Linux distributions are generally bootstrapped through a shim signed through Microsoft's developer program, since that allows them to "just work" on hardware that only has Microsoft's chain of trust checking keys pre-installed. PEP 458 is almost certainly a solid enhancement to PyPI's overall security (assuming we can come up with an acceptable answer for external hosting). It's significantly less clear that PEP 480 is the right answer for delegating more signing authority to developer groups - for example, it may be better to come up with a federated *hosting* model, where external hosting is the answer if developers choose to use their own signing keys rather than PyPI's automated online keys. Things like the Rackspace developer program, or the emerging next generation of Docker-based Platform-as-a-Service offerings, make it easier to recommend such federated hosting models. As a result, my perspective is that it's the UX design concept that will make or break PEP 480 - the security model of TUF looks great to me, what gives me pause is concern over the usability and maintainability of signed uploads for "developers in a hurry". Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Vladimir Diaz

9:43 a.m.

In the PEP 458 world, package authors are not required to do anything. On Wed, Dec 31, 2014 at 11:54 AM, Paul Moore <p.f.moore@gmail.com> wrote:

...

On 31 December 2014 at 16:08, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

...
Let me know exactly what needs to change in the PEPs to make everything explained above clearer. For example, in PEP 458 we provide a link/reference (last paragraph of this subsection) to the Metadata document indicating the content of the JSON files, but should the illustration I've included in this reply also be added?

I don't know how generally useful this would be, and I can't even promise I've got any useful comments to make, but I find the proposals too full of concepts I don't really follow (as someone who isn't a PyPI admin or a security specialist) to be able to get much from them. Is there anywhere a document that simply explains, from the point of view of a package author, what I would need to do that is different from right now, in order to benefit from the proposal? (I assume the benefits are "your users can be sure that they get the files you uploaded, without tampering", and that's sufficient explanation of the benefit side from my perspective).

For example, you say "PEP 480 authors sign for both their project's index page and distribution(s)". Does that mean I need to add something to the command line when I do "setup.py upload"? Can I still set up an automated build process or will it now need manual entry of some sort of passphrase in order to work?

I appreciate that the target audience for these PEPs is really PyPI admins, at the moment, so maybe it's not the right time to look at them from a project author perspective - if so, then feel free to ignore these points for now :-)

Nick has stated that a PEP specifically intended for the project author perspective will be drafted after the "PyPI Admin" PEPs. For now, I think Nick wants to focus on the bits (metadata) signed and published by PyPI (PEP 458). In the PEP 458 model, package authors are not required to sign and upload metadata; they will upload distributions as it is currently done. However, community feedback on PEP 480 (which supports signatures provided by project authors) is still appreciated. Although we haven't finalized all of the details on how the third-party signing tools will change, we can still discuss the overall approach. For example, should signing a distribution require only a secondary passphrase, akin to miniLock <https://minilock.io/>? Should signing a distribution be optional? A manual process? We hope to reach a design that is suported by the majority of the Python community. PEP 480 includes a section that discusses a potential approach to packages signed by package authors: https://www.python.org/dev/peps/pep-0480/#automated-signing-solution Let us know what you think.

...

Thanks, Paul

Paul Moore

10:04 a.m.

On 31 December 2014 at 17:43, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

...

PEP 480 includes a section that discusses a potential approach to packages signed by package authors: https://www.python.org/dev/peps/pep-0480/#automated-signing-solution

Let us know what you think.

Thanks for the pointer. I read the section you referred to (admittedly in isolation). The language is unfamiliar to me, so I'm afraid I didn't get much from it. For example, I don't know what miniLock is, so that analogy was no help. Also, the phrase "the sharing of private keys across multiple machines of each developer" didn't mean much other than to raise alarms for me that I might not be able to simply log onto a new machine (a VM, for example, or a work machine) and do a quick "git clone; hack; python setup.py upload" to release an emergency fix, as I'd need a private key with me (as opposed to a password I can remember), and I'd needto do something to "allow key sharing" . That would be annoying. The "Enter a secondary password" note struck me as odd. Why would I need a *second* password? And why wouldn't I just reuse the same password as I use for PyPI? After all, I'm trusting that password hasn't been compromised, why make it harder on myself by needing to remember two passwords? Terminology-wise, I don't know what "adding a new identity" means. Is that authorising a second developer? Or could I need to have multiple "identities" myself? The first is fine, the second isn't (I'm me, why do I need to have 2 identities just to upload a distribution)? I'm aware of (and sorry about) the fact that this is very much a "drive by" scan of one section of the proposal in isolation. I *hope* it's still useful feedback, even if it's neither thorough nor particularly thoughtful - I was sort of aiming for "something is better than nothing", and that's all :-) Anyway, I'll leave further comment to people with a better understanding of the issue, although I'm happy to clarify if any of the above isn't clear. Paul.

Nick Coghlan

10:24 a.m.

On 1 January 2015 at 04:04, Paul Moore <p.f.moore@gmail.com> wrote:

...

Anyway, I'll leave further comment to people with a better understanding of the issue, although I'm happy to clarify if any of the above isn't clear.

Expert blindness can be a serious problem when it comes to security design, so please keep the questions coming. I've come to the realisation that having done things like blending aspects of the 802.11i WPA2 spec with HF automatic link establishment to come up with a custom authentication protocol means I'm no longer qualified to judge what counts as "common knowledge" in these areas, and the TUF folks leave me in the dust :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

10:42 a.m.

...

On Dec 31, 2014, at 1:04 PM, Paul Moore <p.f.moore@gmail.com> wrote:

On 31 December 2014 at 17:43, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

...
PEP 480 includes a section that discusses a potential approach to packages signed by package authors: https://www.python.org/dev/peps/pep-0480/#automated-signing-solution

Let us know what you think.

Thanks for the pointer. I read the section you referred to (admittedly in isolation). The language is unfamiliar to me, so I'm afraid I didn't get much from it. For example, I don't know what miniLock is, so that analogy was no help. Also, the phrase "the sharing of private keys across multiple machines of each developer" didn't mean much other than to raise alarms for me that I might not be able to simply log onto a new machine (a VM, for example, or a work machine) and do a quick "git clone; hack; python setup.py upload" to release an emergency fix, as I'd need a private key with me (as opposed to a password I can remember), and I'd needto do something to "allow key sharing" . That would be annoying.

The "Enter a secondary password" note struck me as odd. Why would I need a *second* password? And why wouldn't I just reuse the same password as I use for PyPI? After all, I'm trusting that password hasn't been compromised, why make it harder on myself by needing to remember two passwords?

Just to speak to these two points. The purpose behind having a developer sign some files is that you can verify that those files were signed by the person holding the private key belonging to that developer. In more traditional systems like RSA that key is pretty long, For example here’s an RSA private key for TLS: https://github.com/python/psf-salt/blob/master/pillar/dev/secrets/tls/certs/.... Obviously managing a file like that is kind of a pain and it prevents things like being able to spin up a new VM quickly to make a release. Traditionally only the author themselves have access to the private key so there’s no way to get a copy from PyPI or something. The reason for this is that possession of the private key is how you prove that you’re really the author. One option is something called key escrow which essentially means that you take your private key, encrypt it with a password and upload the encrypted blob to PyPI. Then as long as your password is good and the encryption algorithm is good it doesn’t matter if PyPI gets compromised because the encrypted data isn’t able to be decrypted without the password known only to the admin. The particular solution makes a lot of security folks nervous only because if the password *is* bad or something does happen to break the encryption algorithm it means that PyPI now has access to keys that it would be preferential that it didn’t. There are other signing algorithms, such as Ed25519 where the private keys can be much shorter than in RSA. Ed25519 in particular requires only 32 bytes of data for a private key. In addition to that there are particular algorithms that can take an arbitrary number of characters (aka a user picked password) and derive from that 32 bytes that can be used as the Ed25519 private key. This reduces key management down to something as simple as “remember a second password” without needing to resort to something like key escrow. Ideally you would not use the same password as you use for logging into PyPI because you send that password to PyPI anytime you login which would mean that PyPI would more or less know your private key.

...

Terminology-wise, I don't know what "adding a new identity" means. Is that authorising a second developer? Or could I need to have multiple "identities" myself? The first is fine, the second isn't (I'm me, why do I need to have 2 identities just to upload a distribution)?

I'm aware of (and sorry about) the fact that this is very much a "drive by" scan of one section of the proposal in isolation. I *hope* it's still useful feedback, even if it's neither thorough nor particularly thoughtful - I was sort of aiming for "something is better than nothing", and that's all :-)

Anyway, I'll leave further comment to people with a better understanding of the issue, although I'm happy to clarify if any of the above isn't clear.

Paul.

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Paul Moore

11:05 a.m.

On 31 December 2014 at 18:42, Donald Stufft <donald@stufft.io> wrote:

...

Just to speak to these two points. The purpose behind having a developer sign some files is that you can verify that those files were signed by the person holding the private key belonging to that developer. [...]

Thanks for the explanation.

...

Ideally you would not use the same password as you use for logging into PyPI because you send that password to PyPI anytime you login which would mean that PyPI would more or less know your private key.

My problem with this logic is that there's another attack vector that this ignores - what if someone gets access to my PC, which has all of these passwords in a "saved password" store that I use because it's a pain to manage so many passwords (I don't, but you get the point ;-))? I work in a number of secure environments where multiple complex passwords are mandated - and typically password management becomes sufficiently hard that people start to use shortcuts, defeating the object of the whole exercise (heck, end users probably just use "Password01" everywhere, "because it's too hard to remember all those passwords"...) That's not to say that bad security practices justify anything, but on the other hand human factors do imply that it's not automatically guaranteed that two passwords are more secure than one. Single sign-on is a goal for a lot of people for a reason... Paul

Donald Stufft

11:33 a.m.

...

On Dec 31, 2014, at 2:05 PM, Paul Moore <p.f.moore@gmail.com> wrote:

On 31 December 2014 at 18:42, Donald Stufft <donald@stufft.io> wrote:

...
Just to speak to these two points. The purpose behind having a developer sign some files is that you can verify that those files were signed by the person holding the private key belonging to that developer. [...]

Thanks for the explanation.

...
Ideally you would not use the same password as you use for logging into PyPI because you send that password to PyPI anytime you login which would mean that PyPI would more or less know your private key.

My problem with this logic is that there's another attack vector that this ignores - what if someone gets access to my PC, which has all of these passwords in a "saved password" store that I use because it's a pain to manage so many passwords (I don't, but you get the point ;-))? I work in a number of secure environments where multiple complex passwords are mandated - and typically password management becomes sufficiently hard that people start to use shortcuts, defeating the object of the whole exercise (heck, end users probably just use "Password01" everywhere, "because it's too hard to remember all those passwords"...)

That's not to say that bad security practices justify anything, but on the other hand human factors do imply that it's not automatically guaranteed that two passwords are more secure than one. Single sign-on is a goal for a lot of people for a reason...

Basically if your private key, whether it’s a file or it’s a password gets compromised than the person who compromises your private key can masquerade as you without anyone being able to detect the difference. At that point the private key will need to be revoked (as in, programmatically announced as compromised and no longer valid). There’s basically no way to have a signing solution that doesn’t involve some secret bit of data that is known only to you and is never sent to PyPI. As with most things on PyPI we cannot actually mandate that a user practices good security with their private key. They could, for instance, post it on Twitter and there isn’t much that we can do about it. The only way to avoid that is to more or less give up on the idea of surviving a compromise of the PyPI infrastructure. That isn’t itself a *wrong* answer, since signing does impose additional cost to the author and it’s not entirely clear that the cost is worth it. One of the benefits of having the author sign is that you get a chain of what is called an offline key. Essentially all trust comes from pip trusting a set of “root” keys which are not stored on a server in the PyPI infrastructure and are instead stored “offline” in some fashion such as on a USB drive sitting in a safe deposit box, or in an HSM (for the uninitiated, a HSM is a hardware device that stores keys but does not allow you to get the key itself back out. You can use the key by asking the HSM to do operations with it but you can’t get the key back out. Some HSMs even include things like an acid pack inside of them where attempting to physically open the HSM will break the acid pack and destroy the internal memory that holds the key). Then from that root key(s) you can go from key to key in a chain until you get to the author key, and every key in that chain is not stored “online”. This chain of offline keys means that if someone takes over the machines running PyPI they can’t trick people into installing something because the keys that are required to do that don’t exist on PyPI. Like most things in security however, there is no singular right answer and everything is a tradeoff. In this particular case the tradeoff we need to make is between the ability to survive a compromise of the PyPI machines themselves and the UX for end users. For example, if we make signing mandatory and the UX is bad then we're going to hamper the ability for authors to publish new downloads. If that UX is bad enough we can get into a situation where authors choose not to release things as often as they would otherwise prefer to because of the pain associated with releasing. There are also concerns that by making signing mandatory that it's going to have people who have no interest or desire in properly securing a private key being forced to make one and they might just use the same password as they use for PyPI or something like "hunter1" or whatever. In that case for those people and the users of their projects we've not added much additional security but we've increased complexity and the chances of something breaking. If we make signing optional however, then the benefit might not be worth exposing signing to end users. If, for example, an end users downloads and installs 100 different files from PyPI and 99 of them the author opted in to signing them and one of them the author didn't then a compromise of PyPI can compromise that end user through just that one project. I don't believe there is any downside to moving away from relying solely on TLS and using a TUF based scheme where PyPI itself holds the signing keys. That's a net win since you're changing having the PyPI infrastructure manage TLS keys for having the PyPI infrastructure manage TUF keys with the added benefit that TUF "transfers" in ways that TLS simply can't (such as through the CDN or through mirrors or what have you). There *may* be enough downside to having authors sign that it doesn't make sense to expose that. Part of the discussion around PEP 480 should be hammering out what the UX looks like for the authors, deciding if that UX is good enough to make it mandatory or to make it strongly encouraged through pip warnings and PyPI warnings, and ultimately deciding if the tradeoff of the additional burden on authors is worth it. If it's not worth it to the community as a whole then we shouldn't accept PEP 480 and should instead focus on ensuring that we reduce the ability to compromise PyPI itself (which is something we should do anyways of course). In this regard having the opinion from someone who isn't an expert is *extremely* helpful because someone like me already knows how to manage their keys and already does it so for people like me the answer for if having authors needing to manage some secret bit of data is asking too much is an easy answer. No it's not too much to ask me to do that because I'm already doing it (and a lot of developers are via their SSH private keys for instance). (How's that for a tl;dr?) --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Donald Stufft

11:51 a.m.

...

On Dec 31, 2014, at 11:08 AM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

On Wed, Dec 31, 2014 at 2:26 AM, Donald Stufft <donald@stufft.io <mailto:donald@stufft.io>> wrote:

...
On Dec 10, 2014, at 10:16 PM, Vladimir Diaz <vladimir.v.diaz@gmail.com <mailto:vladimir.v.diaz@gmail.com>> wrote:

Hello everyone,

I am a research programmer at the NYU School of Engineering. My colleagues (Trishank Kuppusamy and Justin Cappos) and I are requesting community feedback on our proposal, "Surviving a Compromise of PyPI." The two-stage proposal can be reviewed online at:

PEP 458 http://legacy.python.org/dev/peps/pep-0458/ <http://legacy.python.org/dev/peps/pep-0458/>

PEP 480 http://legacy.python.org/dev/peps/pep-0480/ <http://legacy.python.org/dev/peps/pep-0480/>

Summary of the Proposal:

"Surviving a Compromise of PyPI" proposes how the Python Package Index (PyPI) can be amended to better protect end users from altered or malicious packages, and to minimize the extent of PyPI compromises against affected users. The proposed integration allows package managers such as pip to be more secure against various types of security attacks on PyPI and defend end users from attackers responding to package requests. Specifically, these PEPs describe how PyPI processes should be adapted to generate and incorporate repository metadata, which are signed text files that describe the packages and metadata available on PyPI. Package managers request (along with the packages) the metadata on PyPI to verify the authenticity of packages before they are installed. The changes to PyPI and tools will be minimal by leveraging a library, The Update Framework <https://github.com/theupdateframework/tuf>, that generates and transparently validates the relevant metadata.

The first stage of the proposal (PEP 458 <http://legacy.python.org/dev/peps/pep-0458/>) uses a basic security model that supports verification of PyPI packages signed with cryptographic keys stored on PyPI, requires no action from developers and end users, and protects against malicious CDNs and public mirrors. To support continuous delivery of uploaded packages, PyPI administrators sign for uploaded packages with an online key stored on PyPI infrastructure. This level of security prevents packages from being accidentally or deliberately tampered with by a mirror or a CDN because the mirror or CDN will not have any of the keys required to sign for projects.

The second stage of the proposal (PEP 480 <http://legacy.python.org/dev/peps/pep-0480/>) is an extension to the basic security model (discussed in PEP 458) that supports end-to-end verification of signed packages. End-to-end signing allows both PyPI and developers to sign for the packages that are downloaded by end users. If the PyPI infrastructure were to be compromised, attackers would be unable to serve malicious versions of these packages without access to the project's developer key. As in PEP 458, no additional action is required by end users. However, PyPI administrators will need to periodically (perhaps every few months) sign metadata with an offline key. PEP 480 also proposes an easy-to-use key management solution for developers, how to interface with a potential build farm on PyPI infrastructure, and discusses the security benefits of end-to-end signing. The second stage of the proposal simultaneously supports real-time project registration and developer signatures, and when configured to maximize security on PyPI, less than 1% of end users will be at risk even if an attacker controls PyPI and goes undetected for a month.

We thank Nick Coghlan and Donald Stufft for their valuable contributions, and Giovanni Bajo and Anatoly Techtonik for their feedback.

I’ve just finished (re)reading the white paper, PEP 450, PEP 480, and some of the supporting documentation on the TUF website.

Thanks!

I’m confused about what exactly is contained within the TUF metadata and who signs what in a PEP 480 world.

The following illustration shows what is contained within TUF metadata (JSON files): https://github.com/vladimir-v-diaz/pep-on-pypi-with-tuf/raw/master/pep-0458/... <https://github.com/vladimir-v-diaz/pep-on-pypi-with-tuf/raw/master/pep-0458/...> Note: In this illustration, the "snapshot" and "targets" roles are renamed "release" and "projects", respectively.

If you're interested in what exactly is contained in these JSON files, here is example metadata: https://github.com/theupdateframework/tuf/tree/develop/examples/repository/m... <https://github.com/theupdateframework/tuf/tree/develop/examples/repository/m...>

In a PEP 480 world, project developers sign a single JSON file. For example, developer(s) for the "Request" project sign their assigned JSON file named "/targets/claimed/Requests.json". Specifically, a signature is generated of the "signed" entry <https://github.com/theupdateframework/tuf/blob/develop/examples/repository/m...> of the dictionary. Once the signature is generated, it is added to the "signatures" entry <https://github.com/theupdateframework/tuf/blob/develop/examples/repository/m...> of the JSON file.

In figure 1 of PEP 480, PyPI signs for every metadata except those listed under the "roles signed by developer keys" label: https://github.com/vladimir-v-diaz/pep-maximum-security-model/blob/master/pe... <https://github.com/vladimir-v-diaz/pep-maximum-security-model/blob/master/pe...>

Ok, so authors never actually directly sign the package files themselves, they sign for a document that contains a hash of the files they upload?

...

Currently when you do something like ``pip install FooBar``, pip fetches /simple/FooBar/ to look for potential installation candidates, and when it finds one it downloads it and installs it. This all all “signed” by online keys via TLS.

1. In a TUF world, would pip still fetch /simple/FooBar/ to discover things to install or would it fetch some TUF metadata to find things to install?

In the integration/demo we did with pip, we treated each /simple/ html file as a target (listed the hash and file size of these html index pages in TUF metadata). That is, pip still fetched /simple/FooBar/ to discover distributions to install, but we verified the html files *and* distributions against TUF metadata. In PEP 458, we state that "/simple" is also listed in TUF metadata: http://legacy.python.org/dev/peps/pep-0458/#pypi-and-tuf-metadata <http://legacy.python.org/dev/peps/pep-0458/#pypi-and-tuf-metadata> (last paragraph just before the diagram).

Another option is to avoid crawling/listing the simple index pages and just search TUF metadata for distributions, but this approach will require design changes to pip. We went with the approach (treat the index pages as targets) that required minimal changes to pip.

So I’m personally perfectly happy to make more than minimal changes to pip as I want to get this right rather than just bolt something onto the side.

...

2. If it’s fetching /simple/FooBar/ is that secured by TUF?

Yes, see my response to (1).

3. If it’s secured by TUF who signs the TUF metadata that talks about /simple/FooBar/ in PEP 480 the author or PyPI?

PEP 480 authors sign for both their project's index page and distribution(s) (as indicated in the JSON file):

"A claimed or recently-claimed project will need to upload in its transaction to PyPI not just targets (a simple index as well as distributions) but also TUF metadata. The project MAY do so by uploading a ZIP file containing two directories, /metadata/ (containing delegated targets metadata files) and /targets/ (containing targets such as the project simple index and distributions that are signed by the delegated targets metadata)."

See the second paragraph of http://legacy.python.org/dev/peps/pep-0480/#snapshot-process <http://legacy.python.org/dev/peps/pep-0480/#snapshot-process>.

So here is my problem. I’m completely on board with the developer signing for the distribution files. I think that makes total sense. However I worry that requiring the developer to sign for what is essentially the “installer” API (aka how pip discovers things to install) is going to put us in a situation where we cannot evolve the API easily. If we modified this PEP so that an online key signed for /simple/ what security properties would we lose? It *appears* to me that the problem then would be that a compromise of PyPI can present whatever information they want to pip as to what is available for pip to download and install. This would mean freeze attacks, mix and match attacks. It would also mean that they could, in a future world where pip can use metadata on PyPI to do dependency resolution, tell pip that it needs to download a valid but malicious project as a dependency of a popular project like virtualenv. However I don’t think they’d be able to actually cause pip to install a malicious copy of a good project and I believe that we can protect against an attacker who poses that key from tricking pip into installing a malicious but valid project as a fake dependency by having pip only use the theoretical future PyPI metadata that lists dependencies as an optimization hint for what it should download and then once it’s actually downloaded a project like virtualenv (which has been validated to be from the real author) peek inside that file and ensure that the metadata inside that matches what PyPI told pip. Is my assessment correct? Is keeping the “API” under control of PyPI a reasonable thing to do while keeping the actual distribution files themselves under control of the distribution authors? The reason this worries me is that unlikely a Linux distribution or an application like Firefox or so we don’t have much of a relationship with the people who are uploading things to PyPI. So if we need to evolve the API we are not going to be able to compel our authors to go back and re-generate new signed metadata. An additional thing I see, it appears that all of the metadata in TUF has an expiration. While I think this makes complete sense for things signed by online keys and things signed by keys that the PyPI administrator and/or PSF board I don’t think this is something we can reasonably do for things signed by authors themselves. An author might publish something and then disappear and never come back and forcing them to resign at some point in the future isn’t something we’re reasonably able to do. Is there a plan for how to handle that?

...

Let me know exactly what needs to change in the PEPs to make everything explained above clearer. For example, in PEP 458 we provide a link/reference <https://www.python.org/dev/peps/pep-0458/#what-additional-repository-files-a...> (last paragraph of this subsection) to the Metadata document <https://github.com/theupdateframework/tuf/blob/develop/METADATA.md> indicating the content of the JSON files, but should the illustration <https://github.com/vladimir-v-diaz/pep-on-pypi-with-tuf/raw/master/pep-0458/figure4.pdf>I've included in this reply also be added?

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Vladimir Diaz

1 Jan 1 Jan

8:25 a.m.

On Wed, Dec 31, 2014 at 2:51 PM, Donald Stufft <donald@stufft.io> wrote:

...

On Dec 31, 2014, at 11:08 AM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

On Wed, Dec 31, 2014 at 2:26 AM, Donald Stufft <donald@stufft.io> wrote:

...
On Dec 10, 2014, at 10:16 PM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

Hello everyone,

I am a research programmer at the NYU School of Engineering. My colleagues (Trishank Kuppusamy and Justin Cappos) and I are requesting community feedback on our proposal, "Surviving a Compromise of PyPI." The two-stage proposal can be reviewed online at:

PEP 458 http://legacy.python.org/dev/peps/pep-0458/

PEP 480 http://legacy.python.org/dev/peps/pep-0480/

Summary of the Proposal:

"Surviving a Compromise of PyPI" proposes how the Python Package Index (PyPI) can be amended to better protect end users from altered or malicious packages, and to minimize the extent of PyPI compromises against affected users. The proposed integration allows package managers such as pip to be more secure against various types of security attacks on PyPI and defend end users from attackers responding to package requests. Specifically, these PEPs describe how PyPI processes should be adapted to generate and incorporate repository metadata, which are signed text files that describe the packages and metadata available on PyPI. Package managers request (along with the packages) the metadata on PyPI to verify the authenticity of packages before they are installed. The changes to PyPI and tools will be minimal by leveraging a library, The Update Framework <https://github.com/theupdateframework/tuf>, that generates and transparently validates the relevant metadata.

The first stage of the proposal (PEP 458 <http://legacy.python.org/dev/peps/pep-0458/>) uses a basic security model that supports verification of PyPI packages signed with cryptographic keys stored on PyPI, requires no action from developers and end users, and protects against malicious CDNs and public mirrors. To support continuous delivery of uploaded packages, PyPI administrators sign for uploaded packages with an online key stored on PyPI infrastructure. This level of security prevents packages from being accidentally or deliberately tampered with by a mirror or a CDN because the mirror or CDN will not have any of the keys required to sign for projects.

The second stage of the proposal (PEP 480 <http://legacy.python.org/dev/peps/pep-0480/>) is an extension to the basic security model (discussed in PEP 458) that supports end-to-end verification of signed packages. End-to-end signing allows both PyPI and developers to sign for the packages that are downloaded by end users. If the PyPI infrastructure were to be compromised, attackers would be unable to serve malicious versions of these packages without access to the project's developer key. As in PEP 458, no additional action is required by end users. However, PyPI administrators will need to periodically (perhaps every few months) sign metadata with an offline key. PEP 480 also proposes an easy-to-use key management solution for developers, how to interface with a potential build farm on PyPI infrastructure, and discusses the security benefits of end-to-end signing. The second stage of the proposal simultaneously supports real-time project registration and developer signatures, and when configured to maximize security on PyPI, less than 1% of end users will be at risk even if an attacker controls PyPI and goes undetected for a month.

We thank Nick Coghlan and Donald Stufft for their valuable contributions, and Giovanni Bajo and Anatoly Techtonik for their feedback.

I’ve just finished (re)reading the white paper, PEP 450, PEP 480, and some of the supporting documentation on the TUF website.

Thanks!

...
I’m confused about what exactly is contained within the TUF metadata and who signs what in a PEP 480 world.

The following illustration shows what is contained within TUF metadata (JSON files):

https://github.com/vladimir-v-diaz/pep-on-pypi-with-tuf/raw/master/pep-0458/... Note: In this illustration, the "snapshot" and "targets" roles are renamed "release" and "projects", respectively.

If you're interested in what exactly is contained in these JSON files, here is example metadata:

https://github.com/theupdateframework/tuf/tree/develop/examples/repository/m...

In a PEP 480 world, project developers sign a single JSON file. For example, developer(s) for the "Request" project sign their assigned JSON file named "/targets/claimed/Requests.json". Specifically, a signature is generated of the "signed" entry <https://github.com/theupdateframework/tuf/blob/develop/examples/repository/m...> of the dictionary. Once the signature is generated, it is added to the "signatures" entry <https://github.com/theupdateframework/tuf/blob/develop/examples/repository/m...> of the JSON file.

In figure 1 of PEP 480, PyPI signs for every metadata except those listed under the "roles signed by developer keys" label: https://github.com/vladimir-v-diaz/pep-maximum-security-model/blob/master/pe...

Ok, so authors never actually directly sign the package files themselves, they sign for a document that contains a hash of the files they upload?

Yes, the proposal does *not* generate a signature of just the package file. The hash and size of the package file are included in the document / metadata that is signed. The metadata also includes the relative file path of the package, identifies the hashing algorithm used, when the metadata is set to expire, the metadata's version number, etc.

...

...
Currently when you do something like ``pip install FooBar``, pip fetches /simple/FooBar/ to look for potential installation candidates, and when it finds one it downloads it and installs it. This all all “signed” by online keys via TLS.

1. In a TUF world, would pip still fetch /simple/FooBar/ to discover things to install or would it fetch some TUF metadata to find things to install?

In the integration/demo we did with pip, we treated each /simple/ html file as a target (listed the hash and file size of these html index pages in TUF metadata). That is, pip still fetched /simple/FooBar/ to discover distributions to install, but we verified the html files *and* distributions against TUF metadata. In PEP 458, we state that "/simple" is also listed in TUF metadata: http://legacy.python.org/dev/peps/pep-0458/#pypi-and-tuf-metadata (last paragraph just before the diagram).

Another option is to avoid crawling/listing the simple index pages and just search TUF metadata for distributions, but this approach will require design changes to pip. We went with the approach (treat the index pages as targets) that required minimal changes to pip.

So I’m personally perfectly happy to make more than minimal changes to pip as I want to get this right rather than just bolt something onto the side.

I like this route very much, assuming we have the liberty to improve the "API" and maintain a clean design / integration; flexibility is a good thing in this regard. At the time, we were also unsure what we could change. The integration demo effectively kept a list of available distributions in two locations, the simple html pages (e.g., https://pypi.python.org/simple/requests/, which pip crawled & could be modified by developers through the API) and the JSON metadata. This approach can certainly be simplified / improved IMO.

...

...
2. If it’s fetching /simple/FooBar/ is that secured by TUF?

Yes, see my response to (1).

3. If it’s secured by TUF who signs the TUF metadata that talks about

...
/simple/FooBar/ in PEP 480 the author or PyPI?

PEP 480 authors sign for both their project's index page and distribution(s) (as indicated in the JSON file):

"A claimed or recently-claimed project will need to upload in its transaction to PyPI not just targets (a simple index as well as distributions) but also TUF metadata. The project MAY do so by uploading a ZIP file containing two directories, /metadata/ (containing delegated targets metadata files) and /targets/ (containing targets such as the project simple index and distributions that are signed by the delegated targets metadata)."

See the second paragraph of http://legacy.python.org/dev/peps/pep-0480/#snapshot-process.

So here is my problem. I’m completely on board with the developer signing for the distribution files. I think that makes total sense. However I worry that requiring the developer to sign for what is essentially the “installer” API (aka how pip discovers things to install) is going to put us in a situation where we cannot evolve the API easily. If we modified this PEP so that an online key signed for /simple/ what security properties would we lose?

...

It *appears* to me that the problem then would be that a compromise of PyPI can present whatever information they want to pip as to what is available for pip to download and install. This would mean freeze attacks, mix and match attacks. It would also mean that they could, in a future world where pip can use metadata on PyPI to do dependency resolution, tell pip that it needs to download a valid but malicious project as a dependency of a popular project like virtualenv.

I think is a valid problem, as you have noted. It is probably better to avoid a simple API (signed with an online key) that can be easily modified in the event of a compromise. By "evolve" the API, you mean modify the API such as disallowing developers to re-upload distributions and thus change hashes / file sizes listed in metadata, hide available distributions, etc.? In other words, contradict what is stated in metadata in an "on the fly" manner?

...

However I don’t think they’d be able to actually cause pip to install a malicious copy of a good project and I believe that we can protect against an attacker who poses that key from tricking pip into installing a malicious but valid project as a fake dependency by having pip only use the theoretical future PyPI metadata that lists dependencies as an optimization hint for what it should download and then once it’s actually downloaded a project like virtualenv (which has been validated to be from the real author) peek inside that file and ensure that the metadata inside that matches what PyPI told pip.

Assuming I have understood you correctly (I had to make certain assumptions), I think this is a good observation and assessment. My understanding: 1. Developers sign and upload PEP 480 metadata, which list the project's distributions. 2. Developers do *not* sign the API part (/simple/ in this case). 3. A future / new dependency file format is used to handle dependency resolution. Developers also implicitly sign the new dependency file that is included in distribution(s) uploaded to PyPI. 4. Client code (python or pip?) verifies dependencies, regardless of what pip fetches from PyPI. Now, the "optimization hint" part is to avoid having to open the distribution archive to extract the dependency file, and then continue downloading remaining dependencies? In this manner, pip can just use the project's "online" dependency information, that is also stored on PyPI, to handle dependency resolution more quickly. Once all needed distributions have been downloaded, pip can consult the PEP 480 metadata *and* the dependency file to ensure that no shenanigans have occurred. Is my understanding correct?

...

Is my assessment correct? Is keeping the “API” under control of PyPI a reasonable thing to do while keeping the actual distribution files themselves under control of the distribution authors? The reason this worries me is that unlikely a Linux distribution or an application like Firefox or so we don’t have much of a relationship with the people who are uploading things to PyPI. So if we need to evolve the API we are not going to be able to compel our authors to go back and re-generate new signed metadata.

An additional thing I see, it appears that all of the metadata in TUF has an expiration. While I think this makes complete sense for things signed by online keys and things signed by keys that the PyPI administrator and/or PSF board I don’t think this is something we can reasonably do for things signed by authors themselves. An author might publish something and then disappear and never come back and forcing them to resign at some point in the future isn’t something we’re reasonably able to do. Is there a plan for how to handle that?

Yes, we can add support for this behavior (i.e., the signed JSON can state {expiration: None} and pip / client can ignore expirations in this case). Another technique we may consider is moving these projects to an "abandoned" role (signed with an offline key). Although the PEP's do not currently mention it, we can discuss how to handle abandoned projects here if you wish, before deciding how to proceed. Justin and Trishank (CC'd) can also give feedback on this "abandoned" role.

...

Let me know exactly what needs to change in the PEPs to make everything explained above clearer. For example, in PEP 458 we provide a link/reference <https://www.python.org/dev/peps/pep-0458/#what-additional-repository-files-a...> (last paragraph of this subsection) to the Metadata document <https://github.com/theupdateframework/tuf/blob/develop/METADATA.md> indicating the content of the JSON files, but should the illustration <https://github.com/vladimir-v-diaz/pep-on-pypi-with-tuf/raw/master/pep-0458/figure4.pdf>I've included in this reply also be added?

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

9:57 p.m.

On 1 January 2015 at 05:51, Donald Stufft <donald@stufft.io> wrote:

...

So here is my problem. I’m completely on board with the developer signing for the distribution files. I think that makes total sense. However I worry that requiring the developer to sign for what is essentially the “installer” API (aka how pip discovers things to install) is going to put us in a situation where we cannot evolve the API easily. If we modified this PEP so that an online key signed for /simple/ what security properties would we lose?

It *appears* to me that the problem then would be that a compromise of PyPI can present whatever information they want to pip as to what is available for pip to download and install. This would mean freeze attacks, mix and match attacks. It would also mean that they could, in a future world where pip can use metadata on PyPI to do dependency resolution, tell pip that it needs to download a valid but malicious project as a dependency of a popular project like virtualenv.

However I don’t think they’d be able to actually cause pip to install a malicious copy of a good project and I believe that we can protect against an attacker who poses that key from tricking pip into installing a malicious but valid project as a fake dependency by having pip only use the theoretical future PyPI metadata that lists dependencies as an optimization hint for what it should download and then once it’s actually downloaded a project like virtualenv (which has been validated to be from the real author) peek inside that file and ensure that the metadata inside that matches what PyPI told pip.

Is my assessment correct? Is keeping the “API” under control of PyPI a reasonable thing to do while keeping the actual distribution files themselves under control of the distribution authors? The reason this worries me is that unlikely a Linux distribution or an application like Firefox or so we don’t have much of a relationship with the people who are uploading things to PyPI. So if we need to evolve the API we are not going to be able to compel our authors to go back and re-generate new signed metadata.

I think this is a good entry point for an idea I've had kicking around in my brain for the past couple of days: what if we change the end goal of PEP 480 slightly, from "prevent attackers from compromising published PyPI metadata" to "allow developers & administrators to rapidly detect and recover from compromised PyPI metadata"? My reasoning is that when it comes to PyPI security, there are actually two major dials we can twiddle: * raising the cost of an attack (e.g. making compromise harder by distributing signing authority to developers) * reducing the benefit of an attack (e.g. making the expected duration, and hence reach, of a compromise lower, or downgrading an artifact substitution attack to a denial of service attack) To raise the cost of a compromise through distributed signing authority, we have to solve the trust management problem - getting developer keys out to end users in a way that doesn't involve trusting the central PyPI service. That's actually a really difficult problem to solve, which is why we have situations like TLS still relying on the CA system, despite the known problems with the latter. However, the latter objective is potentially more tractable: we wouldn't need to distribute trust management out to arbitrary end users, we'd "just" need a federated group of entities that are in a position to detect that PyPI has potentially been compromised, and request a service shutdown until such time as the compromise has been investigated and resolved. This notion isn't fully evolved yet (that's why this email is so long), but it feels like a far more viable direction to me than the idea of pushing the enhanced security management problem back on to end users. Suppose, for example, there were additional independently managed validation services hosting TUF metadata for various subsets of PyPI. The enhanced security model would then involve developers opting in to uploading their package metadata to one or more of the validation servers, rather than just to the main PyPI server. pip itself wouldn't worry about checking the validation services - it would just check against the main server as it does today, so we wouldn't need to worry about how we get the root keys for the validation servers out to arbitrary client end points. That is, rather than "sign your own packages", the enhanced security model becomes "get multiple entities to sign your packages, so compromise of any one entity (including PyPI itself) can be detected and investigated appropriately". The *validation* services would then be responsible for checking that their own registered metadata matched the metadata being published on PyPI. If they detect a discrepancy between their own metadata and PyPI's, then we'd have a human-in-the-loop process for reporting the problem, and the most likely response would be to disable PyPI downloads while the situation was resolved. I believe something like that would change the threat landscape in a positive way, and has three very attractive features over distributed signing authority: * It's completely transparent at the point of installation - it transforms PEP 480 into a back end data integrity validation project, rather than something that affects the end user experience of the PyPI ecosystem. The changes to the installation experience would be completely covered by PEP 458. * Uploading metadata to additional servers for signing is relatively low impact on developers (if they have an automated release process, it's likely just another line in a script somewhere), significantly lowering barriers to adoption relative to asking developers to sign their own packages. * Folks that decide to run or use a validation server are likely going to be more closely engaged with the PyPI community, and hence easier to reach as the metadata requirements evolve In terms of how I believe such a change would mitigate the threat of a PyPI compromise: * it provides a cryptographically validated way to detect a compromise of any packages registered with one or more validation services, significantly reducing the likelihood of a meaningful PyPI compromise going undetected * in any subsequent investigation, we'd have multiple sets of cryptographically validated metadata to compare to identify exactly what was compromised, and how it was compromised * the new attack vectors introduced (by compromising the validation services rather than PyPI itself) are *denial of service* attacks (due to PyPI downloads being disabled while the discrepancy is investigated), rather than the artifact substitution that is possible by attacking PyPI directly That means we would move from the status quo, where a full PyPI compromise may permit silent substitution of artifacts to one where an illicit online package substitution would likely be detected in minutes or hours for high profile projects, so the likely pay-off for an attack on the central infrastructure is a denial of service against organisations not using their own local PyPI mirrors, rather than arbitrary software installation on a wide range of systems. Another nice benefit of this approach is that it also protects against attacks on developer PyPI *accounts*, so long as they use different authentication mechanisms on the validation server over the main PyPI server. For example, larger organisations could run their *own* validation server for the packages they publish, and manage it using offline keys as recommended by TUF - that's a lot easier to do when you don't need to allow arbitrary uploads. Specific *projects* could still be attacked (by compromising developer systems), but that's not a new threat, and outside the scope of PEP 458/480 - we're aiming to mitigate the threat of *systemic* compromise that currently makes PyPI a relatively attractive target. As far as the pragmatic aspects go, we could either go with a model where projects are encouraged to run their *own* validation services on something like OpenShift (or even a static hosting site if they generate their validation metadata locally), or else we could look for willing partners to host public PyPI metadata validation servers (e.g. the OpenStack Foundation, Fedora/Red Hat, perhaps someone from the Debian/Ubuntu/Canonical ecosystem, perhaps some of the other commercial Python redistributors) Regards, Nick. [1] Via Leigh Alexander, I was recently introduced to this excellent paper on understanding and working with the mental threat models that users actually have, rather than attempting to educate the users: https://cups.cs.cmu.edu/soups/2010/proceedings/a11_Walsh.pdf. While the paper is specifically written in the context of home PC security, I think that's good advice in general: adjusting software systems to accommodate the reality of human behaviour is usually going to be far more effective than attempting to teach humans to conform to the current needs of the software. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

10:13 p.m.

...

On Jan 2, 2015, at 12:57 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

To raise the cost of a compromise through distributed signing authority, we have to solve the trust management problem - getting developer keys out to end users in a way that doesn't involve trusting the central PyPI service. That's actually a really difficult problem to solve, which is why we have situations like TLS still relying on the CA system, despite the known problems with the latter.

I haven’t read the entirety of your email, but I would like to point out that PEP 480 does not attempt to solve this problem without trusting PyPI. Rather it just moves the trust from trusting the server that runs PyPI to trusting the people running PyPI itself. TUF is fundamentally extremely similar to the CA system except there is only one CA which is scoped to a particular repository (e.g. PyPI) and it includes some distribution specific stuff like file size and delegating partial trust. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

10:33 p.m.

On 2 January 2015 at 16:13, Donald Stufft <donald@stufft.io> wrote:

...

On Jan 2, 2015, at 12:57 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

To raise the cost of a compromise through distributed signing authority, we have to solve the trust management problem - getting developer keys out to end users in a way that doesn't involve trusting the central PyPI service. That's actually a really difficult problem to solve, which is why we have situations like TLS still relying on the CA system, despite the known problems with the latter.

I haven’t read the entirety of your email, but I would like to point out that PEP 480 does not attempt to solve this problem without trusting PyPI. Rather it just moves the trust from trusting the server that runs PyPI to trusting the people running PyPI itself. TUF is fundamentally extremely similar to the CA system except there is only one CA which is scoped to a particular repository (e.g. PyPI) and it includes some distribution specific stuff like file size and delegating partial trust.

That's the part I meant - the signing of developer keys to delegate trust to them without needing to trust the integrity of the online PyPI service. Hence the idea of instead keeping PyPI as an entirely online service (without any offline delegation of authority), and suggesting that developers keep their *own* separately signed metadata, which can then be compared against the PyPI published metadata (both by the developers themselves and by third parties). Discrepancies becoming a trigger for further investigation, which may include suspending the PyPI service if the the discrepancy is reported by an individual or organisation that the PyPI administrators trust. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

10:38 p.m.

...

On Jan 2, 2015, at 1:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 2 January 2015 at 16:13, Donald Stufft <donald@stufft.io <mailto:donald@stufft.io>> wrote:

...
On Jan 2, 2015, at 12:57 AM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote:

To raise the cost of a compromise through distributed signing authority, we have to solve the trust management problem - getting developer keys out to end users in a way that doesn't involve trusting the central PyPI service. That's actually a really difficult problem to solve, which is why we have situations like TLS still relying on the CA system, despite the known problems with the latter.

I haven’t read the entirety of your email, but I would like to point out that PEP 480 does not attempt to solve this problem without trusting PyPI. Rather it just moves the trust from trusting the server that runs PyPI to trusting the people running PyPI itself. TUF is fundamentally extremely similar to the CA system except there is only one CA which is scoped to a particular repository (e.g. PyPI) and it includes some distribution specific stuff like file size and delegating partial trust.

That's the part I meant - the signing of developer keys to delegate trust to them without needing to trust the integrity of the online PyPI service.

Hence the idea of instead keeping PyPI as an entirely online service (without any offline delegation of authority), and suggesting that developers keep their *own* separately signed metadata, which can then be compared against the PyPI published metadata (both by the developers themselves and by third parties). Discrepancies becoming a trigger for further investigation, which may include suspending the PyPI service if the the discrepancy is reported by an individual or organisation that the PyPI administrators trust.

I’m confused what you mean by “without needing to the trust the integrity of the online PyPI service”. Developer keys get signed by offline keys controlled by I’m guessing either myself or Richard or both. The only time we’re depending on the integrity of the machine that runs PyPI and not on an offline key possessed by someone is during the window of time when a new project has been created (the project itself, not a release of a project) and the next time the delegations get signed by the offline keys. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

2 Jan 2 Jan

12:21 a.m.

On 2 January 2015 at 16:38, Donald Stufft <donald@stufft.io> wrote:

...

On Jan 2, 2015, at 1:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

That's the part I meant - the signing of developer keys to delegate trust to them without needing to trust the integrity of the online PyPI service.

Hence the idea of instead keeping PyPI as an entirely online service (without any offline delegation of authority), and suggesting that developers keep their *own* separately signed metadata, which can then be compared against the PyPI published metadata (both by the developers themselves and by third parties). Discrepancies becoming a trigger for further investigation, which may include suspending the PyPI service if the the discrepancy is reported by an individual or organisation that the PyPI administrators trust.

I’m confused what you mean by “without needing to the trust the integrity of the online PyPI service”.

Developer keys get signed by offline keys controlled by I’m guessing either myself or Richard or both. The only time we’re depending on the integrity of the machine that runs PyPI and not on an offline key possessed by someone is during the window of time when a new project has been created (the project itself, not a release of a project) and the next time the delegations get signed by the offline keys.

Yes, as I said, that's the part I mean. To avoid trusting the integrity of the online PyPI service, while still using PyPI as a trust root for the purpose of software installation, we would need to define a system whereby: 1. The PyPI administrators have a set of offline keys 2. Developer are able to supply keys to the PyPI administrators for trust delegation 3. This system has sufficiently low barriers to entry that developers are actually willing to use it 4. This system is compatible with a PyPI run build service We already have a hard security problem to solve (running PyPI), so adding a *second* hard security problem (running what would in effect be a CA) doesn't seem like a good approach to risk mitigation to me. My proposal is that we instead avoid the hard problem of running a CA entirely by advising *developers* to monitor PyPI's integrity by ensuring that what PyPI is publishing matches what they released. That is, we split the end-to-end data integrity validation problem in two and solve each part separately: * use PEP 458 to cover the PyPI -> end user link, with the end users treating PyPI as a trusted authority. End users will be able to detect tampering with the link between them and PyPI, but if the online PyPI service gets compromised, *end users won't detect it*. * use a separate metadata validation process to check that PyPI is publishing the right thing, covering both the developer -> PyPI link *and* the integrity of the PyPI service itself. The metadata validation potentially wouldn't even need to use TUF - developers could simply upload the expected hash of the artifacts they published, and the metadata validation service would check that the signed artifacts from PyPI match those hashes. The core of the idea is simply that there be a separate service (or services) which PyPI can't update, but developers uploading packages *can*. By focusing on detection & recovery, rather than prevention, we can drastically reduce the complexity of the problem to be solved, while still mitigating the major risks we care about. The potential attacks that worry me are the ones that result in silent substitution of artifacts - when it comes to denial of service attacks, there's little reason to mess about with inducing metadata validation failures when there are already far simpler options available. Redistributors may decide to take advantage of the developer metadata validation support to do our own verification of source downloads, but I don't believe upstream needs to worry about that too much - if developers have the means to verify automatically that what PyPI is currently publishing matches what they released, then the redistributor side of things should take care of itself. Another way of viewing the problem is that instead of thinking of the scope of PEP 480 as PyPI delegating trust to developers, we can instead think of it as developers delegating trust to PyPI. PyPI then becomes a choke point in a network graph, rather than the root of a tree. My core idea stays the same regardless of how you look at it though: we *don't* try to solve the problem of letting end users establish in a single step that what they downloaded matches what the developer published. Instead, we aim to provide answers to the questions: * Did I just download what PyPI is currently publishing? * Is PyPI currently publishing what the developer of <project> released? There's no fundamental requirement that those two questions be answered by the *same* security system - we have the option of splitting them, and I'm starting to think that the overall UX will be better if we do. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

12:57 a.m.

...

On Jan 2, 2015, at 3:21 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 2 January 2015 at 16:38, Donald Stufft <donald@stufft.io <mailto:donald@stufft.io>> wrote:

...
On Jan 2, 2015, at 1:33 AM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote:

That's the part I meant - the signing of developer keys to delegate trust to them without needing to trust the integrity of the online PyPI service.

Hence the idea of instead keeping PyPI as an entirely online service (without any offline delegation of authority), and suggesting that developers keep their *own* separately signed metadata, which can then be compared against the PyPI published metadata (both by the developers themselves and by third parties). Discrepancies becoming a trigger for further investigation, which may include suspending the PyPI service if the the discrepancy is reported by an individual or organisation that the PyPI administrators trust.

I’m confused what you mean by “without needing to the trust the integrity of the online PyPI service”.

Developer keys get signed by offline keys controlled by I’m guessing either myself or Richard or both. The only time we’re depending on the integrity of the machine that runs PyPI and not on an offline key possessed by someone is during the window of time when a new project has been created (the project itself, not a release of a project) and the next time the delegations get signed by the offline keys.

Yes, as I said, that's the part I mean. To avoid trusting the integrity of the online PyPI service, while still using PyPI as a trust root for the purpose of software installation, we would need to define a system whereby:

1. The PyPI administrators have a set of offline keys 2. Developer are able to supply keys to the PyPI administrators for trust delegation 3. This system has sufficiently low barriers to entry that developers are actually willing to use it 4. This system is compatible with a PyPI run build service

We already have a hard security problem to solve (running PyPI), so adding a *second* hard security problem (running what would in effect be a CA) doesn't seem like a good approach to risk mitigation to me.

My proposal is that we instead avoid the hard problem of running a CA entirely by advising *developers* to monitor PyPI's integrity by ensuring that what PyPI is publishing matches what they released. That is, we split the end-to-end data integrity validation problem in two and solve each part separately:

* use PEP 458 to cover the PyPI -> end user link, with the end users treating PyPI as a trusted authority. End users will be able to detect tampering with the link between them and PyPI, but if the online PyPI service gets compromised, *end users won't detect it*. * use a separate metadata validation process to check that PyPI is publishing the right thing, covering both the developer -> PyPI link *and* the integrity of the PyPI service itself.

The metadata validation potentially wouldn't even need to use TUF - developers could simply upload the expected hash of the artifacts they published, and the metadata validation service would check that the signed artifacts from PyPI match those hashes. The core of the idea is simply that there be a separate service (or services) which PyPI can't update, but developers uploading packages *can*.

By focusing on detection & recovery, rather than prevention, we can drastically reduce the complexity of the problem to be solved, while still mitigating the major risks we care about. The potential attacks that worry me are the ones that result in silent substitution of artifacts - when it comes to denial of service attacks, there's little reason to mess about with inducing metadata validation failures when there are already far simpler options available.

Redistributors may decide to take advantage of the developer metadata validation support to do our own verification of source downloads, but I don't believe upstream needs to worry about that too much - if developers have the means to verify automatically that what PyPI is currently publishing matches what they released, then the redistributor side of things should take care of itself.

Another way of viewing the problem is that instead of thinking of the scope of PEP 480 as PyPI delegating trust to developers, we can instead think of it as developers delegating trust to PyPI. PyPI then becomes a choke point in a network graph, rather than the root of a tree. My core idea stays the same regardless of how you look at it though: we *don't* try to solve the problem of letting end users establish in a single step that what they downloaded matches what the developer published. Instead, we aim to provide answers to the questions:

* Did I just download what PyPI is currently publishing? * Is PyPI currently publishing what the developer of <project> released?

There's no fundamental requirement that those two questions be answered by the *same* security system - we have the option of splitting them, and I'm starting to think that the overall UX will be better if we do.

Regards, Nick.

-- Nick Coghlan | ncoghlan@gmail.com <mailto:ncoghlan@gmail.com> | Brisbane, Australia

Oh I see. I was just misreading what you meant by “without trusting the integrity of the online PyPI service”, I thought you meant it in a post PEP 480 world, you meant it in a pre (or without) PEP 480 world. So onto the actual thing that you’ve proposed! I have concerns about the actual feasibility of doing such a thing, some of which are similar to my concerns with doing non-mandatory PEP 480. * If uploading to a verifier service is optional then a significant portion of authors simply won’t do it and if you installing 100 things, and 99 of them are verified and 1 of them are not then there is an attack vector that I can use to compromise you undetected (since the author didn’t upload their verification somewhere else). * It’s not actually less work in general, it just pushes the work from the PyPI administrators to the community. This can work well if the community is willing to step up! However, PyPI’s availability/speed problems were originally attempted to be solved by pushing the work to the community via the original mirror system and (not to downplay the people who did step up) the response was not particularly great and the mirrors got a few at first and as the shiny factor wore off people’s mirrors shutdown or stopped working or what have you. * A number of the attacks that TUF protects against do not rely on the attacker creating malicious software packages, things only showing known insecure versions of a project so that they can then attack people through a known exploit. It’s not *wrong* to not protect against these (most systems don’t) but we’d want to explicitly decide that we’re not going to. I’d note that PEP 480 and your proposal aren’t really mutually exclusive so there’s not really harm in *trying* yours and if it fails falling back to something like PEP 480 other than end user confusion if that gets shut down and the cost of actually developing/setting up that solution. Overall I’m +1 on things that enable better detection of a compromise but I’m probably -0.5 or so on your specific proposal as I think that expecting developers to upload verification data to “verification servers” is just pushing work onto other people just so we don’t have to do it. I also think your two questions are not exactly right, because all that means is that it becomes harder to attack *everyone* via a PyPI compromise, however it’s still trivial to attack specific people if you’ve compromised PyPI or the CDN since you can selectively serve maliciously signed packages depending on who is requesting them. To this end I don’t think a solution that pip doesn’t implement is actually going to prevent anything but very dumb attacks by an attacker who has already compromised the PyPI machines. I think another issue here is that we’re effectively doing something similar to TLS except instead of domain names we have project names and that although there are *a lot* of people who really hate the CA system nobody has yet come up with an effective means of actually replacing it without regressing into worse security. The saving grace here is that we operate at a much smaller scale (one “DNS” root, one trust root, ~53k unique names vs… more than I feel like account) so it’s possible that solutions which don’t scale at TLS scale might scale at PyPI scale. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

7:51 a.m.

On 2 January 2015 at 18:57, Donald Stufft <donald@stufft.io> wrote:

...

I have concerns about the actual feasibility of doing such a thing, some of which are similar to my concerns with doing non-mandatory PEP 480.

* If uploading to a verifier service is optional then a significant portion of authors simply won’t do it and if you installing 100 things, and 99 of them are verified and 1 of them are not then there is an attack vector that I can use to compromise you undetected (since the author didn’t upload their verification somewhere else).

Independently of the technical details of the "enhanced validation" support, I now agree pip would at least need to acquire a "fully validated downloads only" mode, where it refused to install anything that wasn't trusted at the higher integrity level. However, it's worth keeping in mind that the truly security conscious people aren't going to ever trust any package that hasn't been through at least some level of preliminary security review (whether their own, or that of an organisation they trust).

...

* It’s not actually less work in general, it just pushes the work from the PyPI administrators to the community. This can work well if the community is willing to step up! However, PyPI’s availability/speed problems were originally attempted to be solved by pushing the work to the community via the original mirror system and (not to downplay the people who did step up) the response was not particularly great and the mirrors got a few at first and as the shiny factor wore off people’s mirrors shutdown or stopped working or what have you.

...

From a practical perspective, I think the work being pushed to the developer community is roughly equivalent. The only question is whether the secret developers are being asked to manage is a separate authentication token for PyPI uploads (which is then used to mathematically secure uploads in such a way that PyPI itself can't change the release metadata for a

project), or an account on a separate validation service, either run by the developer themselves, or by a trusted third party like the OpenStack Foundation, Fedora, Mozilla, etc. My contention is that "to support enhanced validation of your project, you, and anyone else authorised to make releases for your project, will need an account on a metadata validation server, and whenever you make a new release, you will need to register it with the metadata validation server before publishing it on PyPI (otherwise folks using enhanced validation won't be able to install it)" is a relatively simple and easy to understand approach, while "to support enhanced validation of your project, you, and anyone else authorised to make releases for your project, will need to get a developer key signed by the PyPI administrators, and no, you can't just upload the key though the PyPI web UI, because if we let you do that, it wouldn't provide the desired security properties" is inherently confusing, and there's no explanation we can provide that will make it make sense to anyone that hasn't spent years thoroughly steeped in this stuff. Yes, the mathematics technically lets us provide the desired security guarantees within the scope of a single service, but doing it that way is confusing to users in a way that I now believe can be avoided through clear structural separation of the "content distribution" functionality of the main PyPI service and the "content validation" functionality of separate metadata validation services. Having the validation services run by someone *other than the PSF* can also be explained in a way that makes intuitive sense even when someone doesn't understand the technical details, since it's clear that folks with privileged access to a distribution service run by the PSF wouldn't necessarily have privileged access to a validation service run by the OpenStack Foundation (etc) and vice-versa. The "independent online services" approach also provides additional flexibility that isn't offered by a purely mathematical solution - an online metadata service potentially *can* regenerate its metadata in a new format if they can be persuaded its necessary. Having that capability represents a potential denial of service vulnerability specifically against enhanced validation mode (if a validation server is compromised), but there are already much easier ways to execute those.

...

* A number of the attacks that TUF protects against do not rely on the attacker creating malicious software packages, things only showing known insecure versions of a project so that they can then attack people through a known exploit. It’s not *wrong* to not protect against these (most systems don’t) but we’d want to explicitly decide that we’re not going to.

External metadata validation servers can still protect against downgrade attacks - "release is unexpectedly missing" is a metadata discrepancy, just like "release contents are unexpectedly different". Validation servers can also provide additional functionality, like mapping from CVEs to affected versions, that can't be sensibly offered through the same service that is responsible for publishing the software in the first place.

...

I’d note that PEP 480 and your proposal aren’t really mutually exclusive so there’s not really harm in *trying* yours and if it fails falling back to something like PEP 480 other than end user confusion if that gets shut down and the cost of actually developing/setting up that solution.

Overall I’m +1 on things that enable better detection of a compromise but I’m probably -0.5 or so on your specific proposal as I think that expecting developers to upload verification data to “verification servers” is just pushing work onto other people just so we don’t have to do it.

Getting them to manage additional keys, and get them signed and registered appropriately, and then supplying them is going to be a similar amount of work, and the purpose is far more cryptic and confusing. My proposal is basically that instead of asking developers to manage signing keys, we should instead be ask them to manage account on a validation server (or servers). Building on top of service account management also means developers can potentially leverage existing *account* security tools, rather than needing to come up with custom key management solutions for PyPI publishing keys. Technically such a validation server could be run on the PSF infrastructure, but that opens up lots of opportunities for common attacks that provide privileged access to both PyPI and the validation servers - by moving the operation of the latter out to trusted third party organisations, we'd be keeping indirect attacks through the PSF infrastructure side from compromising both systems at the same time.

...

I also think your two questions are not exactly right, because all that means is that it becomes harder to attack *everyone* via a PyPI compromise, however it’s still trivial to attack specific people if you’ve compromised PyPI or the CDN since you can selectively serve maliciously signed packages depending on who is requesting them. To this end I don’t think a solution that pip doesn’t implement is actually going to prevent anything but very dumb attacks by an attacker who has already compromised the PyPI machines.

Yes, while I was thinking we may be able to get away without pip providing enhanced validation support directly, I now agree we'll need to provide it. However, the UX of that shouldn't depend on the technical details of how enhanced validation mode is actually implemented - from an end user perspective, the key things they need to know are: * for many cases, the default level of validation offered by pip (given PEP 458) is likely to be good enough, and will provide access to all the packages on PyPI * for some cases, the default level of validation *isn't* good enough, and for those, pip offers an "enhanced validation" mode * if you turn on enhanced validation, there will be a lot of software that *won't* install * if you want to use enhanced validation, but some projects you'd like to use don't support it, then you'll either need to offer those projects assistance with supporting enhanced validation mode, consider using different dependencies, or else reconsider whether or not you really need enhanced validation for your current use case

...

I think another issue here is that we’re effectively doing something similar to TLS except instead of domain names we have project names and that although there are *a lot* of people who really hate the CA system nobody has yet come up with an effective means of actually replacing it without regressing into worse security. The saving grace here is that we operate at a much smaller scale (one “DNS” root, one trust root, ~53k unique names vs… more than I feel like account) so it’s possible that solutions which don’t scale at TLS scale might scale at PyPI scale.

It's worth noting that my validation server idea is still very much a CA-style model - it's just that instead of registering developer keys with PyPI (as in PEP 480), we'd be registering the trust roots of metadata validation servers with pip. All my idea really does is take the key management problem for enhanced validation away from developers, replacing it with an account management problem that they're likely to already be thoroughly familiar with, even if they don't any experience in secure software distribution. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

8:12 a.m.

...

On Jan 2, 2015, at 10:51 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

Getting them to manage additional keys, and get them signed and registered appropriately, and then supplying them is going to be a similar amount of work, and the purpose is far more cryptic and confusing. My proposal is basically that instead of asking developers to manage signing keys, we should instead be ask them to manage account on a validation server (or servers).

I need to think more about the rest of what you’ve said (and I don’t think it’s a short term problem), but I just wanted to point out that “managing keys” can be as simple as “create a secondary pass(word|phrase) and remember it/write it down/whatever”. It doesn’t need to be “secure this file and copy it around to all of your computers”. Likewise there’s no reason that “delegate authority to someone else” can’t be something like ``twine add-maintainer pip pfmoore``. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

8:26 a.m.

On 3 January 2015 at 02:12, Donald Stufft <donald@stufft.io> wrote:

...

On Jan 2, 2015, at 10:51 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

Getting them to manage additional keys, and get them signed and registered appropriately, and then supplying them is going to be a similar amount of work, and the purpose is far more cryptic and confusing. My proposal is basically that instead of asking developers to manage signing keys, we should instead be ask them to manage account on a validation server (or servers).

I need to think more about the rest of what you’ve said (and I don’t think it’s a short term problem), but I just wanted to point out that “managing keys” can be as simple as “create a secondary pass(word|phrase) and remember it/write it down/whatever”. It doesn’t need to be “secure this file and copy it around to all of your computers”. Likewise there’s no reason that “delegate authority to someone else” can’t be something like ``twine add-maintainer pip pfmoore``.

...

From a technical perspective, I don't think the validation server idea is superior to PEP 480. Where I think it's superior is that I'm far more confident in my ability to explain to a developer with zero security background how separate validation servers provide increased security, as

Yeah, I'm confident that the UI can be made relatively straightforward regardless of how we make the actual validation work. The part I haven't got the faintest clue how to do for the PEP 480 version is building viable "folks models" of what those commands are doing on the back end that will give people confidence that they understand what is going on just from using the tools, rather than leaving them wondering why they need a secondary password, etc. the separation of authority would be structural in addition to mathematical. While the real security would still be coming from the maths, a folk model that believes it is coming from the structural separation between the publication server and the metadata validation servers will be good enough for most practical purposes, and unless someone is particularly interested in the mathematical details, they can largely be handwaved away with "the separation of responsibilities between the services is enforced mathematically". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Vladimir Diaz

10:24 a.m.

Nick, Renaming the PEPs is not problem. Perhaps "PEP 458: Securing the Link from PyPI to the End User" is another option. I am going to read the Rick Walsh paper you've linked and give some careful thought to your proposal. I'll get back to you. I had one person (off-list and recommending how to better explain 480 to non-specialists) say, "the property PEP 480 gives is that developers who sign their project protect their users even if PyPI is compromised. This is because end users are told to trust the developer keys over the keys that are kept on the PyPI server. (PyPI administrators still have a way of using keys that are kept in secure, offline storage to recover if a developer's keys are lost or stolen.)" Yes, you gotta love those "aha" moments - you're in the shower and go to grab the shampoo bottle when it hits you, "aha! That's the solution... Thank you, shampoo bottle of 'Head & Shoulders'" On Fri, Jan 2, 2015 at 11:26 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 3 January 2015 at 02:12, Donald Stufft <donald@stufft.io> wrote:

...
On Jan 2, 2015, at 10:51 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

Getting them to manage additional keys, and get them signed and registered appropriately, and then supplying them is going to be a similar amount of work, and the purpose is far more cryptic and confusing. My proposal is basically that instead of asking developers to manage signing keys, we should instead be ask them to manage account on a validation server (or servers).

I need to think more about the rest of what you’ve said (and I don’t think it’s a short term problem), but I just wanted to point out that “managing keys” can be as simple as “create a secondary pass(word|phrase) and remember it/write it down/whatever”. It doesn’t need to be “secure this file and copy it around to all of your computers”. Likewise there’s no reason that “delegate authority to someone else” can’t be something like ``twine add-maintainer pip pfmoore``.

Yeah, I'm confident that the UI can be made relatively straightforward regardless of how we make the actual validation work. The part I haven't got the faintest clue how to do for the PEP 480 version is building viable "folks models" of what those commands are doing on the back end that will give people confidence that they understand what is going on just from using the tools, rather than leaving them wondering why they need a secondary password, etc.

From a technical perspective, I don't think the validation server idea is superior to PEP 480. Where I think it's superior is that I'm far more confident in my ability to explain to a developer with zero security background how separate validation servers provide increased security, as the separation of authority would be structural in addition to mathematical. While the real security would still be coming from the maths, a folk model that believes it is coming from the structural separation between the publication server and the metadata validation servers will be good enough for most practical purposes, and unless someone is particularly interested in the mathematical details, they can largely be handwaved away with "the separation of responsibilities between the services is enforced mathematically".

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Paul Moore

3:04 a.m.

On 2 January 2015 at 06:38, Donald Stufft <donald@stufft.io> wrote:

...

Developer keys get signed by offline keys controlled by I’m guessing either myself or Richard or both.

One thought here. The issue being discussed here seems mainly to be that it's hard to manage signing of developer keys. That's certainly one issue, but another is that the signing process takes time. When I set up my first project [1], I did so because I had an idea one afternoon, knocked up the first draft and set up the project. If there had been a delay of a week because you and Richard were both on holiday (or even a day, because of timezones) I may not have bothered - I tend to only have the opportunity to work on things for Python in pretty short bursts. You could argue that we don't want projects on PyPI that have been set up with so little preparation - it's a valid position to take - but that's a separate matter. I just want to make the point that management isn't the only issue here. Turnaround time is also a barrier to entry that needs to be considered. And not every project that people want to publish is something major like requests or django... Paul [1] I assume I only need to set up a key once, for my PyPI account. If I need an individual key per project, the cost multiplies. And it means that the barrier is to all new projects, rather than merely to attracting new developers.

Donald Stufft

3:21 a.m.

...

On Jan 2, 2015, at 6:04 AM, Paul Moore <p.f.moore@gmail.com> wrote:

On 2 January 2015 at 06:38, Donald Stufft <donald@stufft.io> wrote:

...
Developer keys get signed by offline keys controlled by I’m guessing either myself or Richard or both.

One thought here. The issue being discussed here seems mainly to be that it's hard to manage signing of developer keys. That's certainly one issue, but another is that the signing process takes time. When I set up my first project [1], I did so because I had an idea one afternoon, knocked up the first draft and set up the project. If there had been a delay of a week because you and Richard were both on holiday (or even a day, because of timezones) I may not have bothered - I tend to only have the opportunity to work on things for Python in pretty short bursts.

You could argue that we don't want projects on PyPI that have been set up with so little preparation - it's a valid position to take - but that's a separate matter. I just want to make the point that management isn't the only issue here. Turnaround time is also a barrier to entry that needs to be considered. And not every project that people want to publish is something major like requests or django...

Paul

[1] I assume I only need to set up a key once, for my PyPI account. If I need an individual key per project, the cost multiplies. And it means that the barrier is to all new projects, rather than merely to attracting new developers.

To be clear, there is zero delay in being able to publish a new project, the delay is between moving from a new project being validated by an online key to an offline key. The only real difference between validation levels is that until it's been signed by an offline key then people installing that project are vulnerable to someone compromising PyPI. This is because until the delegation of project X to a specific developer has been signed the "chain of trust" contains a key that is sitting on the harddrive of PyPI. However, once a delegation of Project X _has_ been signed changing that delegation would need waiting until the next time the delegations were signed by the offline keys. This is because once a project is signed by an offline key then all further changes to the delegation require offline signing. In addition, this does not mean (I believe! we should verify this) that the owner of a project cannot further delegate to other people without delay, since they'll be able to sign that delegation with their own keys and won't require intervention from PyPI. So really it looks like this (abbreviated, not exactly, etc): root (offline) |- PyPI Admins (offline) |- "Unclamined" (online) |- New Project that hasn't yet been signed for by PyPI Admins (offline, owned by author) |- Existing Project that has already been signed for by PyPI Admins (offline, owned by author) The periodic signing process by the PyPI admins just moves a new project from being signed for by the "Unclaimed" online key to being signed for by our offline keys. This process is basically invisible to everyone involved. Does that make sense? --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Paul Moore

4:45 a.m.

On 2 January 2015 at 11:21, Donald Stufft <donald@stufft.io> wrote:

...

To be clear, there is zero delay in being able to publish a new project, the delay is between moving from a new project being validated by an online key to an offline key.

OK, got it. Although on the terminology front, I don't really understand what an "online key" and an "offline key" are. My mind translates them as "worse" and "better" from context, but that's about all. I'm not asking for an explanation (I can look that up) but as terms being encountered by the non-specialist, they contribute to the difficulty in reading the proposals. If there's any better way of naming these two types of keys, that would be accessible to the non-specialist, that would be a great help. (Actually, the other implication I read into "offline key" is "something that I, as the owner, have to take care of and manage" - and that scares me because I'm rubbish at managing keys or anything like that, basically anything more complex than a password - at least until we get to things like RSA keys and tokens that I use for "work" and am paid to manage as opposed to "hobby" stuff).

...

The only real difference between validation levels is that until it's been signed by an offline key then people installing that project are vulnerable to someone compromising PyPI. This is because until the delegation of project X to a specific developer has been signed the "chain of trust" contains a key that is sitting on the harddrive of PyPI.

However, once a delegation of Project X _has_ been signed changing that delegation would need waiting until the next time the delegations were signed by the offline keys. This is because once a project is signed by an offline key then all further changes to the delegation require offline signing.

"Delegation" is another term that doesn't make immediate sense. I read it as "Confirm ownership" sort of, Again, it's not that I can't work out what the term means, but I don't get an immediate sense of the implications. Here, for example, it's not immediately clear whether delegation changes would be common or rare, or whether having them happen quickly would be important. (For example, if you're not available for a pip release, and we never bothered sharing the keys because it's easier just for you to have them, would we need a delegation change to do an emergency release?) Again, this isn't something that it's important to clarify for me here and now, but I would like to see the PEP updated to clarify this sort of issue in terms that are accessible to the layman.

...

In addition, this does not mean (I believe! we should verify this) that the owner of a project cannot further delegate to other people without delay, since they'll be able to sign that delegation with their own keys and won't require intervention from PyPI.

See above - implies to me that if the "owner" (in terms of keys rather than project structure) is unavailable, other project members may not be able to upload (rather than as now, when they can upload with the project's PyPI account password and/or standard password recovery processes to the project email address).

...

So really it looks like this (abbreviated, not exactly, etc):

root (offline) |- PyPI Admins (offline) |- "Unclamined" (online) |- New Project that hasn't yet been signed for by PyPI Admins (offline, owned by author) |- Existing Project that has already been signed for by PyPI Admins (offline, owned by author)

I'm not keen on the term "unclaimed". All projects on PyPI right now are "unclaimed" by that usage, which is emphatically not what "unclaimed" means intuitively to me. Surely "pip" is "claimed" by the PyPA? Maybe "unverified" works as a term (as in, verifying your account when signing up to a forum). I get the idea that unclaimed implies there's a risk, and sure there is, but this smacks of using a loaded term to rub people's noses in the fact that what they've been happily using for years "isn't safe". This happens a lot with security debates, and IMO actively discourages people from buying into the changes. It would be useful to have *some* document (or part thereof - maybe an overview section in the PEP) structured as an objective cost/benefit proposal: 1. The current PyPI infrastructure has the following risks. We assess the chance that they might occur as X. 2. The impact on the reader, as an individual, of a compromise, would be the following. 3. The cost to the reader, under the proposed solution, of avoiding the risks, is the following. There are probably at least two classes of reader involved - project authors and project users. If in either case one class of user has to bear some cost on behalf of the other, then that should be called out. I believe that I (and any other readers of the proposals) should be able to sensibly assess the cost/benefit tradeoffs on my own, given the above information. My experience and judgement may not be typical, so my opinion should be taken in the context of others, but that doesn't mean I'm wrong, or that my views are inappropriate for me. For example, in 20 years of extensively using software off the internet, I have never once downloaded a piece of software that wasn't what it claimed, and I expected it, to be. So the discussion of compromising PyPI packages seems like an amazingly low risk to me[1].

...

The periodic signing process by the PyPI admins just moves a new project from being signed for by the "Unclaimed" online key to being signed for by our offline keys. This process is basically invisible to everyone involved.

It's as visible to end users as the significance of describing something as "unclaimed". If nobody cared a project was "unclaimed" then it would be invisible. Otherwise, less so. Hence my preference for a less emotive term.

...

Does that make sense?

Yes it does - many thanks for the explanation. Paul [1] That's just an example, and it would be off-topic to debate the various other things that overall contribute to why and to what level I'm currently comfortable using PyPI. And I'm not running a business that would fail if PyPI were compromised. So please just take this as a small data point, nothing more.

Donald Stufft

5:48 a.m.

...

On Jan 2, 2015, at 7:45 AM, Paul Moore <p.f.moore@gmail.com> wrote:

On 2 January 2015 at 11:21, Donald Stufft <donald@stufft.io> wrote:

...
To be clear, there is zero delay in being able to publish a new project, the delay is between moving from a new project being validated by an online key to an offline key.

OK, got it.

Although on the terminology front, I don't really understand what an "online key" and an "offline key" are. My mind translates them as "worse" and "better" from context, but that's about all. I'm not asking for an explanation (I can look that up) but as terms being encountered by the non-specialist, they contribute to the difficulty in reading the proposals. If there's any better way of naming these two types of keys, that would be accessible to the non-specialist, that would be a great help.

(Actually, the other implication I read into "offline key" is "something that I, as the owner, have to take care of and manage" - and that scares me because I'm rubbish at managing keys or anything like that, basically anything more complex than a password - at least until we get to things like RSA keys and tokens that I use for "work" and am paid to manage as opposed to "hobby" stuff).

Hmm, I’m not sure if there’s really a better way of naming them. Those are "standard" names for them and they are defined in the PEP under the Definitions section [1]. For PEP 458 there is zero key management done by anyone other than the people running PyPI. It is essentially replacing using TLS for verification with using TUF for verficiation (which gets us a few wins, but still leaves a few enhancements on the table that we get with the addition of PEP 480). For PEP 480 authors have to manage *something*. What exactly that *something* is, is a question for PEP 480. One (poor in my opinion) possibility is an RSA key which means that authors will need to manage keeping that file around and backed up and such. Another possiblility is a secondary password which is only used when needing to sign something (so during uploads and the like).

...

...
The only real difference between validation levels is that until it's been signed by an offline key then people installing that project are vulnerable to someone compromising PyPI. This is because until the delegation of project X to a specific developer has been signed the "chain of trust" contains a key that is sitting on the harddrive of PyPI.

However, once a delegation of Project X _has_ been signed changing that delegation would need waiting until the next time the delegations were signed by the offline keys. This is because once a project is signed by an offline key then all further changes to the delegation require offline signing.

"Delegation" is another term that doesn't make immediate sense. I read it as "Confirm ownership" sort of, Again, it's not that I can't work out what the term means, but I don't get an immediate sense of the implications. Here, for example, it's not immediately clear whether delegation changes would be common or rare, or whether having them happen quickly would be important. (For example, if you're not available for a pip release, and we never bothered sharing the keys because it's easier just for you to have them, would we need a delegation change to do an emergency release?)

Again, this isn't something that it's important to clarify for me here and now, but I would like to see the PEP updated to clarify this sort of issue in terms that are accessible to the layman.

This isn’t defined in the PEP. In PEP 480 though, if I’m the only person who has been setup to be allowed to sign for pip releases, then nobody else can release pip without intervention from PyPI Administrators. That’s not entirely different than the current situation where if I was the only person added as a maintainer to the pip project on PyPI and someone else needed to do a release they couldn’t without intervention from the PyPI administrators. IOW doing the required steps to enable other people’s keys to sign would be part of adding them as maintainers to that project.

...

...
In addition, this does not mean (I believe! we should verify this) that the owner of a project cannot further delegate to other people without delay, since they'll be able to sign that delegation with their own keys and won't require intervention from PyPI.

See above - implies to me that if the "owner" (in terms of keys rather than project structure) is unavailable, other project members may not be able to upload (rather than as now, when they can upload with the project's PyPI account password and/or standard password recovery processes to the project email address).

If the owner or someone they delegated to, yes.

...

...
So really it looks like this (abbreviated, not exactly, etc):

root (offline) |- PyPI Admins (offline) |- "Unclamined" (online) |- New Project that hasn't yet been signed for by PyPI Admins (offline, owned by author) |- Existing Project that has already been signed for by PyPI Admins (offline, owned by author)

I'm not keen on the term "unclaimed". All projects on PyPI right now are "unclaimed" by that usage, which is emphatically not what "unclaimed" means intuitively to me. Surely "pip" is "claimed" by the PyPA? Maybe "unverified" works as a term (as in, verifying your account when signing up to a forum). I get the idea that unclaimed implies there's a risk, and sure there is, but this smacks of using a loaded term to rub people's noses in the fact that what they've been happily using for years "isn't safe". This happens a lot with security debates, and IMO actively discourages people from buying into the changes.

Unclaimed is probably a bad name for it, although this name wouldn’t actually be exposed to end users. It’s sort of like if we had rel=“unclaimed” links on /simple/.

...

It would be useful to have *some* document (or part thereof - maybe an overview section in the PEP) structured as an objective cost/benefit proposal:

1. The current PyPI infrastructure has the following risks. We assess the chance that they might occur as X. 2. The impact on the reader, as an individual, of a compromise, would be the following. 3. The cost to the reader, under the proposed solution, of avoiding the risks, is the following.

There are probably at least two classes of reader involved - project authors and project users. If in either case one class of user has to bear some cost on behalf of the other, then that should be called out.

I believe that I (and any other readers of the proposals) should be able to sensibly assess the cost/benefit tradeoffs on my own, given the above information. My experience and judgement may not be typical, so my opinion should be taken in the context of others, but that doesn't mean I'm wrong, or that my views are inappropriate for me. For example, in 20 years of extensively using software off the internet, I have never once downloaded a piece of software that wasn't what it claimed, and I expected it, to be. So the discussion of compromising PyPI packages seems like an amazingly low risk to me[1].

See an upcoming email about refocusing the discussion here.

...

...
The periodic signing process by the PyPI admins just moves a new project from being signed for by the "Unclaimed" online key to being signed for by our offline keys. This process is basically invisible to everyone involved.

It's as visible to end users as the significance of describing something as "unclaimed". If nobody cared a project was "unclaimed" then it would be invisible. Otherwise, less so. Hence my preference for a less emotive term.

...
Does that make sense?

Yes it does - many thanks for the explanation.

Paul

[1] That's just an example, and it would be off-topic to debate the various other things that overall contribute to why and to what level I'm currently comfortable using PyPI. And I'm not running a business that would fail if PyPI were compromised. So please just take this as a small data point, nothing more.

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Brett Cannon

24 Dec 24 Dec

11:11 a.m.

I'll just say that when I blogged about PyPI security at https://snarky.ca/how-do-you-verify-pypi-can-be-trusted/ the idea Nick is proposing was along the lines of what I thought about as a solution to the non-typosquatting side of the security problem. Which is to mean it should be easy enough to explain to non-security folk if I could figure it out. ;) Nick Coghlan wrote:

...

On 1 January 2015 at 05:51, Donald Stufft donald@stufft.io wrote:

...
So here is my problem. I’m completely on board with the developer signing for the distribution files. I think that makes total sense. However I worry that requiring the developer to sign for what is essentially the “installer” API (aka how pip discovers things to install) is going to put us in a situation where we cannot evolve the API easily. If we modified this PEP so that an online key signed for /simple/ what security properties would we lose? It appears to me that the problem then would be that a compromise of PyPI can present whatever information they want to pip as to what is available for pip to download and install. This would mean freeze attacks, mix and match attacks. It would also mean that they could, in a future world where pip can use metadata on PyPI to do dependency resolution, tell pip that it needs to download a valid but malicious project as a dependency of a popular project like virtualenv. However I don’t think they’d be able to actually cause pip to install a malicious copy of a good project and I believe that we can protect against an attacker who poses that key from tricking pip into installing a malicious but valid project as a fake dependency by having pip only use the theoretical future PyPI metadata that lists dependencies as an optimization hint for what it should download and then once it’s actually downloaded a project like virtualenv (which has been validated to be from the real author) peek inside that file and ensure that the metadata inside that matches what PyPI told pip. Is my assessment correct? Is keeping the “API” under control of PyPI a reasonable thing to do while keeping the actual distribution files themselves under control of the distribution authors? The reason this worries me is that unlikely a Linux distribution or an application like Firefox or so we don’t have much of a relationship with the people who are uploading things to PyPI. So if we need to evolve the API we are not going to be able to compel our authors to go back and re-generate new signed metadata. I think this is a good entry point for an idea I've had kicking around in

my brain for the past couple of days: what if we change the end goal of PEP 480 slightly, from "prevent attackers from compromising published PyPI metadata" to "allow developers & administrators to rapidly detect and recover from compromised PyPI metadata"? My reasoning is that when it comes to PyPI security, there are actually two major dials we can twiddle:

raising the cost of an attack (e.g. making compromise harder by distributing signing authority to developers) reducing the benefit of an attack (e.g. making the expected duration, and hence reach, of a compromise lower, or downgrading an artifact substitution attack to a denial of service attack)

To raise the cost of a compromise through distributed signing authority, we have to solve the trust management problem - getting developer keys out to end users in a way that doesn't involve trusting the central PyPI service. That's actually a really difficult problem to solve, which is why we have situations like TLS still relying on the CA system, despite the known problems with the latter. However, the latter objective is potentially more tractable: we wouldn't need to distribute trust management out to arbitrary end users, we'd "just" need a federated group of entities that are in a position to detect that PyPI has potentially been compromised, and request a service shutdown until such time as the compromise has been investigated and resolved. This notion isn't fully evolved yet (that's why this email is so long), but it feels like a far more viable direction to me than the idea of pushing the enhanced security management problem back on to end users. Suppose, for example, there were additional independently managed validation services hosting TUF metadata for various subsets of PyPI. The enhanced security model would then involve developers opting in to uploading their package metadata to one or more of the validation servers, rather than just to the main PyPI server. pip itself wouldn't worry about checking the validation services - it would just check against the main server as it does today, so we wouldn't need to worry about how we get the root keys for the validation servers out to arbitrary client end points. That is, rather than "sign your own packages", the enhanced security model becomes "get multiple entities to sign your packages, so compromise of any one entity (including PyPI itself) can be detected and investigated appropriately". The validation services would then be responsible for checking that their own registered metadata matched the metadata being published on PyPI. If they detect a discrepancy between their own metadata and PyPI's, then we'd have a human-in-the-loop process for reporting the problem, and the most likely response would be to disable PyPI downloads while the situation was resolved. I believe something like that would change the threat landscape in a positive way, and has three very attractive features over distributed signing authority:

It's completely transparent at the point of installation - it transforms PEP 480 into a back end data integrity validation project, rather than something that affects the end user experience of the PyPI ecosystem. The changes to the installation experience would be completely covered by PEP 458. Uploading metadata to additional servers for signing is relatively low impact on developers (if they have an automated release process, it's likely just another line in a script somewhere), significantly lowering barriers to adoption relative to asking developers to sign their own packages. Folks that decide to run or use a validation server are likely going to be more closely engaged with the PyPI community, and hence easier to reach as the metadata requirements evolve

In terms of how I believe such a change would mitigate the threat of a PyPI compromise:

it provides a cryptographically validated way to detect a compromise of any packages registered with one or more validation services, significantly reducing the likelihood of a meaningful PyPI compromise going undetected in any subsequent investigation, we'd have multiple sets of cryptographically validated metadata to compare to identify exactly what was compromised, and how it was compromised the new attack vectors introduced (by compromising the validation services rather than PyPI itself) are denial of service attacks (due to PyPI downloads being disabled while the discrepancy is investigated), rather than the artifact substitution that is possible by attacking PyPI directly

That means we would move from the status quo, where a full PyPI compromise may permit silent substitution of artifacts to one where an illicit online package substitution would likely be detected in minutes or hours for high profile projects, so the likely pay-off for an attack on the central infrastructure is a denial of service against organisations not using their own local PyPI mirrors, rather than arbitrary software installation on a wide range of systems. Another nice benefit of this approach is that it also protects against attacks on developer PyPI accounts, so long as they use different authentication mechanisms on the validation server over the main PyPI server. For example, larger organisations could run their own validation server for the packages they publish, and manage it using offline keys as recommended by TUF - that's a lot easier to do when you don't need to allow arbitrary uploads. Specific projects could still be attacked (by compromising developer systems), but that's not a new threat, and outside the scope of PEP 458/480

we're aiming to mitigate the threat of systemic compromise that currently makes PyPI a relatively attractive target.

As far as the pragmatic aspects go, we could either go with a model where projects are encouraged to run their own validation services on something like OpenShift (or even a static hosting site if they generate their validation metadata locally), or else we could look for willing partners to host public PyPI metadata validation servers (e.g. the OpenStack Foundation, Fedora/Red Hat, perhaps someone from the Debian/Ubuntu/Canonical ecosystem, perhaps some of the other commercial Python redistributors) Regards, Nick. [1] Via Leigh Alexander, I was recently introduced to this excellent paper on understanding and working with the mental threat models that users actually have, rather than attempting to educate the users: https://cups.cs.cmu.edu/soups/2010/proceedings/a11_Walsh.pdf. While the paper is specifically written in the context of home PC security, I think that's good advice in general: adjusting software systems to accommodate the reality of human behaviour is usually going to be far more effective than attempting to teach humans to conform to the current needs of the software.

Donald Stufft

2 Jan 2 Jan

6:25 a.m.

...

On Dec 10, 2014, at 10:16 PM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

Hello everyone,

I am a research programmer at the NYU School of Engineering. My colleagues (Trishank Kuppusamy and Justin Cappos) and I are requesting community feedback on our proposal, "Surviving a Compromise of PyPI." The two-stage proposal can be reviewed online at:

PEP 458 http://legacy.python.org/dev/peps/pep-0458/ <http://legacy.python.org/dev/peps/pep-0458/>

PEP 480 http://legacy.python.org/dev/peps/pep-0480/ <http://legacy.python.org/dev/peps/pep-0480/>

I’m going through the PEPs again, and I think that evaluating these PEPs is more complicated by the fact that there is two of them, I agree that splitting up the two PEPs was the right thing to do though. What do you think about putting PEP 480 on the back burner for the moment and focus on PEP 458. I think this will benefit the process in a few ways: * I believe that PEP 458 is far less controversial than PEP 480 since it’s effectively just exchanging TLS for TUF for verification and other than for the authors of the tools who need to work with it (pip, devpi, bandersnatch, PyPI, etc) it’s transparent for the end users. * It allows us to discuss the particulars of getting TUF integrated in PyPI without worrying about the far nastier problem of exposing a signing UX to end authors. * While discussing it, we can still ensure that we leave ourselves enough flexibility so that we don’t need to make any major changes for PEP 480. * The problems that are going to crop up while implementing it (is the mirror protocol good enough? CDN Purging? Etc) are mostly likely to happen during PEP 458. * I think having them both being discussed at the same time is causing a lot of confusion between which proposal does what. * By focusing on one at a time we can get a more polished PEP that is easier to understand (see below). Overall I think the PEPs themselves need a bit of work, they currently rely a lot on reading the TUF white paper and the TUF spec in the repository to get a grasp of what’s going on. I believe they also read more like suggestions about how we might go about implementing TUF rather than an actionable plan for implementing TUF. I’ve seen feedback from more than one person that they feel like they are having a hard time grok’ing what the *impact* of the PEPs will be to them without having to go through and understand the intricacies of TUF and how the PEP implements TUF. Technically I’m listed on this PEP as an author (although I don’t remember anymore what I did to get on there :D), but I care a good bit about getting something better than TLS setup, so, if y’all are willing to defer PEP 480 for right now and focus on PEP 458 and y’all want it, I’m willing to actually earn that co-author spot and help get PEP 458 into shape and help get it accepted and implemented. Either way though, I suggest focus on PEP 458 (with an eye towards not making any decisions which will require changes on the client side to implement PEP 480). --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

6:55 a.m.

On 3 January 2015 at 00:25, Donald Stufft <donald@stufft.io> wrote:

...

On Dec 10, 2014, at 10:16 PM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

Hello everyone,

I am a research programmer at the NYU School of Engineering. My colleagues (Trishank Kuppusamy and Justin Cappos) and I are requesting community feedback on our proposal, "Surviving a Compromise of PyPI." The two-stage proposal can be reviewed online at:

PEP 458 http://legacy.python.org/dev/peps/pep-0458/

PEP 480 http://legacy.python.org/dev/peps/pep-0480/

I’m going through the PEPs again, and I think that evaluating these PEPs is more complicated by the fact that there is two of them, I agree that splitting up the two PEPs was the right thing to do though. What do you think about putting PEP 480 on the back burner for the moment and focus on PEP 458.

+1 from me - being able to resolve them in that order was my main motivation for suggesting they be split apart into separate proposals. I just don't personally have any major open questions for PEP 458 - while I'm aware there are some significant technical details to be resolved in terms of exactly what gets signed, and how the implementation will work in practice, I think the concept is sound, and I don't have the necessary knowledge of pip and PyPI internals to have an opinion on the details of the integration. For PEP 480, by contrast, I still have some fundamental questions about whether we should go with a model of "additional developer managed secrets" (the PEP 480 approach, where developers register an additional authentication token for uploads with the PyPI administrators that mean compromising PyPI doesn't grant the ability to do fake uploads for a service) or "additional developer managed accounts" (my new suggestion in this thread, where we collaborate with other organisations like Linux distributions and the OpenStack Foundation for federated validation of critical uploads). While the two approaches are mathematically equivalent, I suspect they're profoundly different in terms of how easy it will be for folks to grasp the relevant security properties.

...

I think this will benefit the process in a few ways:

* I believe that PEP 458 is far less controversial than PEP 480 since it’s effectively just exchanging TLS for TUF for verification and other than for the authors of the tools who need to work with it (pip, devpi, bandersnatch, PyPI, etc) it’s transparent for the end users. * It allows us to discuss the particulars of getting TUF integrated in PyPI without worrying about the far nastier problem of exposing a signing UX to end authors. * While discussing it, we can still ensure that we leave ourselves enough flexibility so that we don’t need to make any major changes for PEP 480. * The problems that are going to crop up while implementing it (is the mirror protocol good enough? CDN Purging? Etc) are mostly likely to happen during PEP 458. * I think having them both being discussed at the same time is causing a lot of confusion between which proposal does what. * By focusing on one at a time we can get a more polished PEP that is easier to understand (see below).

Resolving PEP 458 first was my intention when I made the proposal to split them. I only brought the broader PEP 480 proposal into question because the validation server based design finally started to click for me earlier today.

...

Overall I think the PEPs themselves need a bit of work, they currently rely a lot on reading the TUF white paper and the TUF spec in the repository to get a grasp of what’s going on. I believe they also read more like suggestions about how we might go about implementing TUF rather than an actionable plan for implementing TUF. I’ve seen feedback from more than one person that they feel like they are having a hard time grok’ing what the *impact* of the PEPs will be to them without having to go through and understand the intricacies of TUF and how the PEP implements TUF.

Technically I’m listed on this PEP as an author (although I don’t remember anymore what I did to get on there :D),

IIRC, the current draft of the snapshotting mechanism relied heavily on your input regarding what was feasible in the context of the PyPI API design.

...

but I care a good bit about getting something better than TLS setup, so, if y’all are willing to defer PEP 480 for right now and focus on PEP 458 and y’all want it, I’m willing to actually earn that co-author spot and help get PEP 458 into shape and help get it accepted and implemented.

If the TUF folks are amenable, I think that would be great. You've been through the PEP process a few times now, and are probably far more familiar with what's involved in creating a robust PEP than you might wish ;)

...

Either way though, I suggest focus on PEP 458 (with an eye towards not making any decisions which will require changes on the client side to implement PEP 480).

Yep. I'll make one more post to try to clarify why I see separate validation services as having the potential to offer a superior UX in a PEP 480 context, but I'm not pushing for a short term decision on that one - I'm just aiming to plant the seeds of ideas that may be worth keeping in mind as we work on PEP 458. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

7:02 a.m.

...

On Jan 2, 2015, at 9:55 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

I just don't personally have any major open questions for PEP 458 - while I'm aware there are some significant technical details to be resolved in terms of exactly what gets signed, and how the implementation will work in practice, I think the concept is sound, and I don't have the necessary knowledge of pip and PyPI internals to have an opinion on the details of the integration.

To be clear, I also think that PEP 458’s concept is sound and I think it’s the right direction to go in. The things I think are holding it back are nailing down the technical details and getting it so that someone can get a high level understanding of what it’s actually doing without needing to read the supporting documentation (white paper, etc) and without some level of what most would call expert knowledge in the area (though I don’t like to call myself an expert, I realize to most people I likely am). --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Paul Moore

7:31 a.m.

On 2 January 2015 at 14:25, Donald Stufft <donald@stufft.io> wrote:

...

I’m going through the PEPs again, and I think that evaluating these PEPs is more complicated by the fact that there is two of them, I agree that splitting up the two PEPs was the right thing to do though. What do you think about putting PEP 480 on the back burner for the moment and focus on PEP 458.

I think this will benefit the process in a few ways:

* I believe that PEP 458 is far less controversial than PEP 480 since it’s effectively just exchanging TLS for TUF for verification and other than for the authors of the tools who need to work with it (pip, devpi, bandersnatch, PyPI, etc) it’s transparent for the end users. * It allows us to discuss the particulars of getting TUF integrated in PyPI without worrying about the far nastier problem of exposing a signing UX to end authors. * While discussing it, we can still ensure that we leave ourselves enough flexibility so that we don’t need to make any major changes for PEP 480. * The problems that are going to crop up while implementing it (is the mirror protocol good enough? CDN Purging? Etc) are mostly likely to happen during PEP 458. * I think having them both being discussed at the same time is causing a lot of confusion between which proposal does what. * By focusing on one at a time we can get a more polished PEP that is easier to understand (see below).

Overall I think the PEPs themselves need a bit of work, they currently rely a lot on reading the TUF white paper and the TUF spec in the repository to get a grasp of what’s going on. I believe they also read more like suggestions about how we might go about implementing TUF rather than an actionable plan for implementing TUF. I’ve seen feedback from more than one person that they feel like they are having a hard time grok’ing what the *impact* of the PEPs will be to them without having to go through and understand the intricacies of TUF and how the PEP implements TUF.

Technically I’m listed on this PEP as an author (although I don’t remember anymore what I did to get on there :D), but I care a good bit about getting something better than TLS setup, so, if y’all are willing to defer PEP 480 for right now and focus on PEP 458 and y’all want it, I’m willing to actually earn that co-author spot and help get PEP 458 into shape and help get it accepted and implemented.

Either way though, I suggest focus on PEP 458 (with an eye towards not making any decisions which will require changes on the client side to implement PEP 480).

+1 on all of this. I agree that PEP 458 is (relatively speaking) an obvious thing to do, and if the people who have to do the work for it are happy, I think it should just go ahead. I'd like to see the PEPs reworded a bit to be less intimidating to the non-specialist. For PEPs about the trust model for PyPI, it's ironic that I have to place a lot of trust in the PEP authors simply because I don't understand half of what they are saying ;-) Paul

Vladimir Diaz

8:14 a.m.

Thanks for the great feedback - Nick, Donald, Paul, and Richard (off-list). I am totally fine with focusing on PEP 458 and applying the final coat of paint on this document. There's a lot of background documentation and technical details excluded from the PEPs (to avoid turning the PEP into a 15+ page behemoth), but I do agree that we should explicitly cover some of these implementation details in PEP 458. Subsections on the exact format of metadata, explanation on how metadata is signed, and how the roles are "delegated" with the library, still remain. As Paul as indicated, terminology can also be improved so as to be more readable for "non-experts." Let me know how we should collaborate on PEP 458 going forward. Guido van Rossum made minor corrections to PEP 458, and requested we reflect his changes back to the version on Github. We can either move hg.python.org/pep/pep-0458.txt <https://hg.python.org/peps/file/a532493ba99c/pep-0458.txt> to github.com/pypa or github.com/theupdateframework/pep-on-pypi-with-tuf. Thanks, Vlad On Fri, Jan 2, 2015 at 10:31 AM, Paul Moore <p.f.moore@gmail.com> wrote:

...

...
I’m going through the PEPs again, and I think that evaluating these PEPs is more complicated by the fact that there is two of them, I agree that splitting up the two PEPs was the right thing to do though. What do you think about putting PEP 480 on the back burner for the moment and focus on PEP 458.

I think this will benefit the process in a few ways:

* I believe that PEP 458 is far less controversial than PEP 480 since it’s effectively just exchanging TLS for TUF for verification and other than for the authors of the tools who need to work with it (pip, devpi, bandersnatch, PyPI, etc) it’s transparent for the end users. * It allows us to discuss the particulars of getting TUF integrated in PyPI without worrying about the far nastier problem of exposing a signing UX to end authors. * While discussing it, we can still ensure that we leave ourselves enough flexibility so that we don’t need to make any major changes for PEP

...
* The problems that are going to crop up while implementing it (is the mirror protocol good enough? CDN Purging? Etc) are mostly likely to happen during PEP 458. * I think having them both being discussed at the same time is causing a lot of confusion between which proposal does what. * By focusing on one at a time we can get a more polished PEP that is easier to understand (see below).

Overall I think the PEPs themselves need a bit of work, they currently rely a lot on reading the TUF white paper and the TUF spec in the repository to get a grasp of what’s going on. I believe they also read more like suggestions about how we might go about implementing TUF rather than an actionable

...
for implementing TUF. I’ve seen feedback from more than one person that

On 2 January 2015 at 14:25, Donald Stufft <donald@stufft.io> wrote: 480. plan they

...
feel like they are having a hard time grok’ing what the *impact* of the PEPs will be to them without having to go through and understand the intricacies of TUF and how the PEP implements TUF.

Technically I’m listed on this PEP as an author (although I don’t remember anymore what I did to get on there :D), but I care a good bit about getting something better than TLS setup, so, if y’all are willing to defer PEP 480 for right now and focus on PEP 458 and y’all want it, I’m willing to actually earn that co-author spot and help get PEP 458 into shape and help get it accepted and implemented.

Either way though, I suggest focus on PEP 458 (with an eye towards not making any decisions which will require changes on the client side to implement PEP 480).

+1 on all of this.

I agree that PEP 458 is (relatively speaking) an obvious thing to do, and if the people who have to do the work for it are happy, I think it should just go ahead.

I'd like to see the PEPs reworded a bit to be less intimidating to the non-specialist. For PEPs about the trust model for PyPI, it's ironic that I have to place a lot of trust in the PEP authors simply because I don't understand half of what they are saying ;-)

Paul

Donald Stufft

8:26 a.m.

...

On Jan 2, 2015, at 11:14 AM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

Thanks for the great feedback - Nick, Donald, Paul, and Richard (off-list).

I am totally fine with focusing on PEP 458 and applying the final coat of paint on this document.

There's a lot of background documentation and technical details excluded from the PEPs (to avoid turning the PEP into a 15+ page behemoth), but I do agree that we should explicitly cover some of these implementation details in PEP 458. Subsections on the exact format of metadata, explanation on how metadata is signed, and how the roles are "delegated" with the library, still remain. As Paul as indicated, terminology can also be improved so as to be more readable for "non-experts."

Let me know how we should collaborate on PEP 458 going forward. Guido van Rossum made minor corrections to PEP 458, and requested we reflect his changes back to the version on Github. We can either move hg.python.org/pep/pep-0458.txt <https://hg.python.org/peps/file/a532493ba99c/pep-0458.txt> to github.com/pypa <http://github.com/pypa> or github.com/theupdateframework/pep-on-pypi-with-tuf <http://github.com/theupdateframework/pep-on-pypi-with-tuf>.

As far as I’m concerned I’m willing to collab however is best for y’all. It appears you’re doing it on Github in the https://github.com/theupdateframework/pep-on-pypi-with-tuf repository so I’m happy to make PRs there. I’m also happy to make PRs elsewhere as well though I prefer somewhere on Github. I’ll sit down with PEP 458 maybe this weekend and see if I can crank out some PRs to refine it. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

8:30 a.m.

On 3 January 2015 at 02:26, Donald Stufft <donald@stufft.io> wrote:

...

On Jan 2, 2015, at 11:14 AM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

Thanks for the great feedback - Nick, Donald, Paul, and Richard (off-list).

I am totally fine with focusing on PEP 458 and applying the final coat of paint on this document.

There's a lot of background documentation and technical details excluded from the PEPs (to avoid turning the PEP into a 15+ page behemoth), but I do agree that we should explicitly cover some of these implementation details in PEP 458. Subsections on the exact format of metadata, explanation on how metadata is signed, and how the roles are "delegated" with the library, still remain. As Paul as indicated, terminology can also be improved so as to be more readable for "non-experts."

Let me know how we should collaborate on PEP 458 going forward. Guido van Rossum made minor corrections to PEP 458, and requested we reflect his changes back to the version on Github. We can either move hg.python.org/pep/pep-0458.txt <https://hg.python.org/peps/file/a532493ba99c/pep-0458.txt> to github.com/pypa or github.com/theupdateframework/pep-on-pypi-with-tuf.

As far as I’m concerned I’m willing to collab however is best for y’all. It appears you’re doing it on Github in the https://github.com/theupdateframework/pep-on-pypi-with-tuf repository so I’m happy to make PRs there. I’m also happy to make PRs elsewhere as well though I prefer somewhere on Github. I’ll sit down with PEP 458 maybe this weekend and see if I can crank out some PRs to refine it.

It probably makes sense to pull the TUF PEPs into the new pypa/interoperability-peps repo with the rest of them, and add Vladimir et al as developers on that repo (or just to the general PyPA developers group). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Vladimir Diaz

8:47 a.m.

I prefer pulling the TUF PEPs (available on hg.python.org) into github.com/pypa. Please add Justin, Linda, Trishank, and myself as collaborators: https://github.com/vladimir-v-diaz https://github.com/dachshund https://github.com/JustinCappos https://github.com/lvigdor P.S. Donald helped tremendously with the snapshot process, Ed25519 library, ideas, and feedback. I think that earns a spot on the authors list. On Fri, Jan 2, 2015 at 11:30 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 3 January 2015 at 02:26, Donald Stufft <donald@stufft.io> wrote:

...
On Jan 2, 2015, at 11:14 AM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

Thanks for the great feedback - Nick, Donald, Paul, and Richard (off-list).

I am totally fine with focusing on PEP 458 and applying the final coat of paint on this document.

There's a lot of background documentation and technical details excluded from the PEPs (to avoid turning the PEP into a 15+ page behemoth), but I do agree that we should explicitly cover some of these implementation details in PEP 458. Subsections on the exact format of metadata, explanation on how metadata is signed, and how the roles are "delegated" with the library, still remain. As Paul as indicated, terminology can also be improved so as to be more readable for "non-experts."

Let me know how we should collaborate on PEP 458 going forward. Guido van Rossum made minor corrections to PEP 458, and requested we reflect his changes back to the version on Github. We can either move hg.python.org/pep/pep-0458.txt <https://hg.python.org/peps/file/a532493ba99c/pep-0458.txt> to github.com/pypa or github.com/theupdateframework/pep-on-pypi-with-tuf.

As far as I’m concerned I’m willing to collab however is best for y’all. It appears you’re doing it on Github in the https://github.com/theupdateframework/pep-on-pypi-with-tuf repository so I’m happy to make PRs there. I’m also happy to make PRs elsewhere as well though I prefer somewhere on Github. I’ll sit down with PEP 458 maybe this weekend and see if I can crank out some PRs to refine it.

It probably makes sense to pull the TUF PEPs into the new pypa/interoperability-peps repo with the rest of them, and add Vladimir et al as developers on that repo (or just to the general PyPA developers group).

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

10:35 a.m.

...

On Jan 2, 2015, at 11:47 AM, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

I prefer pulling the TUF PEPs (available on hg.python.org <http://hg.python.org/>) into github.com/pypa <http://github.com/pypa>.

Please add Justin, Linda, Trishank, and myself as collaborators:

https://github.com/vladimir-v-diaz <https://github.com/vladimir-v-diaz> https://github.com/dachshund <https://github.com/dachshund> https://github.com/JustinCappos <https://github.com/JustinCappos> https://github.com/lvigdor <https://github.com/lvigdor>

P.S. Donald helped tremendously with the snapshot process, Ed25519 library, ideas, and feedback. I think that earns a spot on the authors list.

I’ve gone ahead and imported PEP 458 and the three image files it uses into https://github.com/pypa/interoperability-peps and added the above names as contributors. I probably won’t get to it today but I’ll try to get to it tomorrow to go through PEP 458 and PR/issue what I can find. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Sumana Harihareswara

20 Dec 20 Dec

5:18 p.m.

Earlier this year, Brett Cannon consulted with Donald and updated the status of PEP 458 to Deferred. https://github.com/python/peps/pull/931 The PEP status is now Draft again and the new proposed title is "PEP 458: Secure transport independent download integrity for PyPI packages" (see https://github.com/secure-systems-lab/peps/blob/c13384a4fac6822626abb7e09ab7... ). Current discussion is happening on Discourse: https://discuss.python.org/t/pep-458-surviving-a-compromise-of-pypi/2648/ -- Sumana Harihareswara Changeset Consulting https://changeset.nyc

Sumana Harihareswara

12 Feb 12 Feb

3:45 p.m.

...

It looks like discussion about the actual meat and potatoes of this PEP has petered out. Unless someone has an objection, I intend to accept

The revised PEP 458 is at https://www.python.org/dev/peps/pep-0458/ as "PEP 458 -- Secure PyPI downloads with package signing." Discussion has been proceeding on Discourse. BDFL-Delegate Donald Stufft wrote today https://discuss.python.org/t/pep-458-secure-pypi-downloads-with-package-sign... : this PEP on Friday. -- Sumana Harihareswara Changeset Consulting https://changeset.nyc

Paul Moore

2 Jan 2 Jan

9:23 a.m.

On 2 January 2015 at 16:14, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote:

...

There's a lot of background documentation and technical details excluded from the PEPs (to avoid turning the PEP into a 15+ page behemoth), but I do agree that we should explicitly cover some of these implementation details in PEP 458. Subsections on the exact format of metadata, explanation on how metadata is signed, and how the roles are "delegated" with the library, still remain. As Paul as indicated, terminology can also be improved so as to be more readable for "non-experts."

There's a tension here, in that while the detail is useful, the non-expert content needs to be *less* words, not more - the problem is one of an intimidating wall of text, much more than that the concepts are difficult. Adding more content, however necessary it is to the implementation side, probably detracts here. Just as an example, take the first paragraph of the abstract: """ This PEP proposes how the Python Package Index (PyPI [1] ) should be integrated with The Update Framework [2] (TUF). TUF was designed to be a flexible security add-on to a software updater or package manager. The framework integrates best security practices such as separating role responsibilities, adopting the many-man rule for signing packages, keeping signing keys offline, and revocation of expired or compromised signing keys. For example, attackers would have to steal multiple signing keys stored independently to compromise a role responsible for specifying a repository's available files. Another role responsible for indicating the latest snapshot of the repository may have to be similarly compromised, and independent of the first compromised role. """ There's no way I, even with the benefit of the discussions both on the list here and off-list, can skim that sentence and get anything more meaningful than "blah, blah, blah" from it. Taking a moment to think about it is fine, but at that point you've lost me (in terms of encouraging me to read the document, rather than just debate based on what I think it's going to say). None of the "best security practices" mentioned in sentence 3 mean anything to me - they all basically look like "do more security stuff" to me, and "doing security stuff" unfortunately, from my experience, translates to "jump through annoying hoops that stop me getting my job done"... Similarly, the last 2 sentences (the example) read as saying "attackers have to work harder". Well good, I'd sort of hope the PEP provided a benefit of some sort :-) The following (somewhat off the cuff) may work better: """ This PEP proposes some changes to the Python Package Index (PyPI) infrastructure to enhance its resistance to attack without requiring any changes in workflow from project authors. It is similar to recent changes to use TLS for all PyPI traffic, but offers additional security over TLS. Future PEPs may look at further security enhancements that could require additional action from project authors, but this PEP is restricted to the PyPI infrastructure and mirroring software. """ Sorry for what is pretty much a hatchet job on a single paragraph take out of context from the PEP, but hopefully you can see what I'm trying to say - skip details of security concepts, attack vectors, etc, and concentrate on "we're making things better - it's as low impact as the change to TLS - project authors won't need to do anything (at this stage)". Those are the key messages to get across straight away. A few more paragraphs in the abstract explaining in non-technical terms what's being done and why, and you're done. You can say "the rest of the document is the technical details for people with an understanding of (or interest in) the security concepts" and get on with the specifics. If you can get the non-specialists to read just the abstract section, and go away knowing they are happy to let the expects get on with the process, you've won. Put the details and the scary terms in the "for experts and interested parties only" main body. Paul.

Vladimir Diaz

10 a.m.

Paul, I understand your point, and made an attempt at what you're suggesting with my initial post to this mailing list (addressing a different target audience): "Surviving a Compromise of PyPI" proposes how the Python Package Index (PyPI) can be amended to better protect end users from altered or malicious packages, and to minimize the extent of PyPI compromises against affected users. The proposed integration allows package managers such as pip to be more secure against various types of security attacks on PyPI and defend end users from attackers responding to package requests. Specifically, these PEPs describe how PyPI processes should be adapted to generate and incorporate repository metadata, which are signed text files that describe the packages and metadata available on PyPI. Package managers request (along with the packages) the metadata on PyPI to verify the authenticity of packages before they are installed. The changes to PyPI and tools will be minimal by leveraging a library, The Update Framework <https://github.com/theupdateframework/tuf>, that generates and transparently validates the relevant metadata. The first stage of the proposal (PEP 458 <http://legacy.python.org/dev/peps/pep-0458/>) uses a basic security model that supports verification of PyPI packages signed with cryptographic keys stored on PyPI, requires no action from developers and end users, and protects against malicious CDNs and public mirrors. To support continuous delivery of uploaded packages, PyPI administrators sign for uploaded packages with an online key stored on PyPI infrastructure. This level of security prevents packages from being accidentally or deliberately tampered with by a mirror or a CDN because the mirror or CDN will not have any of the keys required to sign for projects." Perhaps parts of this introduction (and your suggestion) will work better as an abstract, instead of assuming that the PyPI admins will be the only ones reading PEP 458. Thanks again for all the feedback. On Fri, Jan 2, 2015 at 12:23 PM, Paul Moore <p.f.moore@gmail.com> wrote:

...

...
There's a lot of background documentation and technical details excluded from the PEPs (to avoid turning the PEP into a 15+ page behemoth), but I do agree that we should explicitly cover some of these implementation

On 2 January 2015 at 16:14, Vladimir Diaz <vladimir.v.diaz@gmail.com> wrote: details

...
in PEP 458. Subsections on the exact format of metadata, explanation on how metadata is signed, and how the roles are "delegated" with the library, still remain. As Paul as indicated, terminology can also be improved so as to be more readable for "non-experts."

There's a tension here, in that while the detail is useful, the non-expert content needs to be *less* words, not more - the problem is one of an intimidating wall of text, much more than that the concepts are difficult. Adding more content, however necessary it is to the implementation side, probably detracts here.

Just as an example, take the first paragraph of the abstract:

""" This PEP proposes how the Python Package Index (PyPI [1] ) should be integrated with The Update Framework [2] (TUF). TUF was designed to be a flexible security add-on to a software updater or package manager. The framework integrates best security practices such as separating role responsibilities, adopting the many-man rule for signing packages, keeping signing keys offline, and revocation of expired or compromised signing keys. For example, attackers would have to steal multiple signing keys stored independently to compromise a role responsible for specifying a repository's available files. Another role responsible for indicating the latest snapshot of the repository may have to be similarly compromised, and independent of the first compromised role. """

There's no way I, even with the benefit of the discussions both on the list here and off-list, can skim that sentence and get anything more meaningful than "blah, blah, blah" from it. Taking a moment to think about it is fine, but at that point you've lost me (in terms of encouraging me to read the document, rather than just debate based on what I think it's going to say).

None of the "best security practices" mentioned in sentence 3 mean anything to me - they all basically look like "do more security stuff" to me, and "doing security stuff" unfortunately, from my experience, translates to "jump through annoying hoops that stop me getting my job done"... Similarly, the last 2 sentences (the example) read as saying "attackers have to work harder". Well good, I'd sort of hope the PEP provided a benefit of some sort :-)

The following (somewhat off the cuff) may work better:

""" This PEP proposes some changes to the Python Package Index (PyPI) infrastructure to enhance its resistance to attack without requiring any changes in workflow from project authors. It is similar to recent changes to use TLS for all PyPI traffic, but offers additional security over TLS. Future PEPs may look at further security enhancements that could require additional action from project authors, but this PEP is restricted to the PyPI infrastructure and mirroring software. """

Sorry for what is pretty much a hatchet job on a single paragraph take out of context from the PEP, but hopefully you can see what I'm trying to say - skip details of security concepts, attack vectors, etc, and concentrate on "we're making things better - it's as low impact as the change to TLS - project authors won't need to do anything (at this stage)". Those are the key messages to get across straight away. A few more paragraphs in the abstract explaining in non-technical terms what's being done and why, and you're done. You can say "the rest of the document is the technical details for people with an understanding of (or interest in) the security concepts" and get on with the specifics.

If you can get the non-specialists to read just the abstract section, and go away knowing they are happy to let the expects get on with the process, you've won. Put the details and the scary terms in the "for experts and interested parties only" main body.

Paul.

Nick Coghlan

8:14 a.m.

On 3 January 2015 at 01:31, Paul Moore <p.f.moore@gmail.com> wrote:

...

On 2 January 2015 at 14:25, Donald Stufft <donald@stufft.io> wrote:

...
Either way though, I suggest focus on PEP 458 (with an eye towards not making any decisions which will require changes on the client side to implement PEP 480).

+1 on all of this.

I agree that PEP 458 is (relatively speaking) an obvious thing to do, and if the people who have to do the work for it are happy, I think it should just go ahead.

I'd like to see the PEPs reworded a bit to be less intimidating to the non-specialist. For PEPs about the trust model for PyPI, it's ironic that I have to place a lot of trust in the PEP authors simply because I don't understand half of what they are saying ;-)

FWIW, Niels Ferguson's and Bruce Scheier's "Practical Cryptography" was probably the single most enlightening book I've read on the topic. The NIST standards in this area are also genuinely excellent (the occasional less than ideal technical recommendation from certain government agencies notwithstanding), and if you can afford the paywall (or work for an organisation that can do so), actually reading relevant sections of IEEE 802.11i was a key part of my own learning. (My specific interest was in authentication protocols for access control, hence why I was reading the Wi-Fi WPA2 spec, but a lot of the underlying cryptographic concepts are shared between the different kinds of digital verification) For broader context, Schneier's "Secrets and Lies" is good from a technical perspective, while the more recent "Liars and Outliers" looks to situate the security mindset in a broader social environment. There's a reason Schneier is as well respected as he is - if you're ever looking for general advice on how to be pragmatically paranoid, then he's a great source to turn to. That said, while I doubt we're going to be able to completely de-jargonise the PEP details, I agree it would be worthwhile to ensure there's a clear explanation of the practical consequences for folks that we'd otherwise lose in the cryptographic weeds. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Brett Cannon

24 Dec 24 Dec

11:07 a.m.

As a bystander trying to keep up I also give a +1 behind focusing on PEP 458 and letting PEP 480 just sit there for now as Deferred.

1795

Age (days ago)

3684

Last active (days ago)

List overview

Download

53 comments

7 participants

participants (7)

Brett Cannon
Donald Stufft
Nick Coghlan
Paul Moore
Richard Jones
Sumana Harihareswara
Vladimir Diaz

Surviving a Compromise of PyPI - PEP 458 and 480

tags

participants (7)