Currently, PyPI allows you to upload a GPG signature along with your package file as well as associate a GPG Short ID with your user. Theoretically this allows end users to not trust PyPI and instead validate end to end signatures from the original author. I've written [1] previously about package signing, and about a number of common suggestions for achieving it that don't actually do much, if anything, to increase to the security of things. The current implementation on PyPI falls into such a trap. The main problem with GPG and package signing is that a GPG key provides some guarantees (ignoring issues with the concept of a WOT) about the *identity* of the person possessing the key, however it provides no mechanism for providing any guarantees about what *capabilities* should be granted to that person. More concretely, while you can use GPG as is to verify that yes, "Donald Stufft" signed a particular package, you cannot use it to determine if "Donald Stufft" is *allowed* to sign for that package, a valid signature from me on the requests project should be just as invalid as an invalid signature from anyone on the requests project. The only namespacing provided by GPG itself is "trusted key" vs "not trusted key". PyPI offers a work around for this in the form of allowing users to associate their GPG short ID with their user profile. However this is not actually very useful because it doesn't really provide much benefit overall. The goal of signing a package on PyPI is generally to allow you to safely download the file without trusting PyPI, but if you need to trust PyPI to determine what key is allowed to sign a package, then you've not really added much in the way of additional assurances, you've just added another possible point of failure. Beyond the inherent issues with attempting to use GPG support for anything useful on PyPI there are also a number of implementation specific issues with this support. Currently you *must* use a GPG Short ID with PyPI, however the GPG Short IDs are not actually secure and can be pretty easily brute forced to have a collision, which means that people can make keys that come out to the same short ID as one that an author has in their profile. In addition, uploading a signature to PyPI is not actually validated upon upload, allowing people to upload a signature that doesn't actually validate, causing a persist failure mode (though nobody validates our signatures, so nobody ever runs into this problem). On top of all of that, I believe that there will never be a case where a tool like pip supports these GPG signatures [2]. Even if you wiped away all of the above problems, GPG is still a complex standard without great support for building tooling around it. The most reasonable way of implementing support for that would be to ship a copy of the gpg binary around for different platforms and shell out to it. However GPG is GPL licensed which means that's not something we could actually do, and even if it were, shipping binaries is not generally a reasonable thing to do for pip and anything besides pure Python is a no go. I am aware of a single tool anywhere that actively supports verifying the signatures that people upload to PyPI, and that is Debian's uscan program. Even in that case the people writing the Debian watch file have to hardcode in a signing key into it and in my experience, when faced with a validation error it's not unusual for Debian to simply disable signature checking for that project and/or just blindly update the key to whatever the new key is. All in all, I think that there is not a whole lot of point to having this feature in PyPI, it is predicated a bunch of invalid assumptions (as detailed above) and I do not believe end users are actually even using the keys that are being uploaded. Last time I looked, pip, easy_install, and bandersnatch represented something like 99% of all download traffic on PyPI and none of those will do anything with the .asc files being uploaded to PyPI (other than bandersnatch just blindly mirroring it). When looking at the number of projects actively using this feature on PyPI, I can see that 27931/591919 files on PyPI have the ``has_signature`` database field set to true, or roughly 4% of all files on PyPI, which roughly holds up when you look at the number of distinct projects that have ever uploaded a signature as well (3559/80429). Thus, I would like to remove this feature from PyPI (but not from PEP 503, if other repositories want to continue to support it they are free to). Doing this would allow simplifying code we have in Warehouse anyplace we touch uploaded files (since we almost always end up needing to branch into special behavior for files ending with .asc). It will allow us to reduce the number of concepts in the UI (what is a pgp signature? What do I do with it? etc) without simply hiding a feature (which is likely to cause confusion, why do you support it if you won't show it etc). I think it will also make releasing slightly easier for developers, since I personally know a number of authors on PyPI who don't really believe there is any value in signing their packages on PyPI, but they do it anyways because of a vague notion that they should do it. If we do it, an open question would be what we do with all of the *existing* signatures on PyPI. We could just leave them in place and stop accepting new signatures, though that still means we end up needing to branch on .asc anyplace we handle files because they'll still be a valid code path. Another option is to just simply get rid of them and act as if nobody ever uploaded them in the first place, which is my preferred option. What do folks think? Would anyone be particularly against getting rid of the GPG support in PyPI? [1] https://caremad.io/2013/07/packaging-signing-not-holy-grail/ [2] When we do implement package signing in pip, it will almost certainly be via TUF, most likely using ed25519 signatures but perhaps using RSA. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 12 May 2016 at 12:41, Donald Stufft <donald@stufft.io> wrote:
What do folks think? Would anyone be particularly against getting rid of the GPG support in PyPI?
28K projects is too many to do a mailshot, but would it be worth asking this question more widely than on distutils-sig? Just "Do you maintain a project on PyPI that has GPG sigs and would you care if we removed them? If so, please let us know on the thread on distutils-sig." On an unrelated note, it might be a good feature for Warehouse to add some means of notifying project owners for cases like this. Paul
On May 12, 2016, at 8:05 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 12 May 2016 at 12:41, Donald Stufft <donald@stufft.io> wrote:
What do folks think? Would anyone be particularly against getting rid of the GPG support in PyPI?
28K projects is too many to do a mailshot, but would it be worth asking this question more widely than on distutils-sig? Just "Do you maintain a project on PyPI that has GPG sigs and would you care if we removed them? If so, please let us know on the thread on distutils-sig.”
It's 28k *files* but a single project can have more than one file. The total number of projects that have *ever* uploaded a file with a signature is 3.5k and of that 3.5k, only 2.7k projects have their *latest* release uploaded with signatures.
On an unrelated note, it might be a good feature for Warehouse to add some means of notifying project owners for cases like this. Paul
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 12 May 2016 at 21:41, Donald Stufft <donald@stufft.io> wrote:
Thus, I would like to remove this feature from PyPI (but not from PEP 503, if other repositories want to continue to support it they are free to). Doing this would allow simplifying code we have in Warehouse anyplace we touch uploaded files (since we almost always end up needing to branch into special behavior for files ending with .asc). It will allow us to reduce the number of concepts in the UI (what is a pgp signature? What do I do with it? etc) without simply hiding a feature (which is likely to cause confusion, why do you support it if you won't show it etc). I think it will also make releasing slightly easier for developers, since I personally know a number of authors on PyPI who don't really believe there is any value in signing their packages on PyPI, but they do it anyways because of a vague notion that they should do it.
I'm a fan of most ideas that lower barriers to migration from the legacy PyPI codebase to Warehouse :) However, I think one of the core points here is that even if GPG keys are removed from the *public* API, we still don't want the migration to break things for folks with automated upload processes that supply signature information - there'll be enough teething problems for the migration without a potential coincident breakage for a couple of thousands projects. That means the trade-off to consider is whether or not dropping this feature will get to the point of being able to deploy Warehouse sooner or not. - if it's already implemented & tested in Warehouse, then leave it alone - you can take it out later, some time *after* the migration - if it's implemented, but not well tested, decide what to do based on whether testing or the below idea seems like less work - if it's not implemented in Warehouse yet, that's when the key requirement becomes "don't break automated signed uploads" If we drop the feature, then my suggestion for a future upload UX is to let uploads with signatures succeed, but implicitly discard the signatures, and *automatically email the maintainer explaining why their uploaded signatures aren't appearing*. Similarly, projects that *had* signatures uploaded should gain a database flag indicating previously uploaded keys have been discarded, such that: 1. The admin page for the project explains why the signatures are gone 2. The affected package maintainers are emailed with a list of projects they maintain from which signatures have been removed The aim here would be to ensure that package maintainers tempted to ask "Where did my package signatures go?" are able to have that question answered *within the context of the service itself*, even if they've never heard of distutils-sig. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On May 12, 2016, at 8:56 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 12 May 2016 at 21:41, Donald Stufft <donald@stufft.io> wrote:
Thus, I would like to remove this feature from PyPI (but not from PEP 503, if other repositories want to continue to support it they are free to). Doing this would allow simplifying code we have in Warehouse anyplace we touch uploaded files (since we almost always end up needing to branch into special behavior for files ending with .asc). It will allow us to reduce the number of concepts in the UI (what is a pgp signature? What do I do with it? etc) without simply hiding a feature (which is likely to cause confusion, why do you support it if you won't show it etc). I think it will also make releasing slightly easier for developers, since I personally know a number of authors on PyPI who don't really believe there is any value in signing their packages on PyPI, but they do it anyways because of a vague notion that they should do it.
I'm a fan of most ideas that lower barriers to migration from the legacy PyPI codebase to Warehouse :)
However, I think one of the core points here is that even if GPG keys are removed from the *public* API, we still don't want the migration to break things for folks with automated upload processes that supply signature information - there'll be enough teething problems for the migration without a potential coincident breakage for a couple of thousands projects. That means the trade-off to consider is whether or not dropping this feature will get to the point of being able to deploy Warehouse sooner or not.
Both PyPI and Warehouse silently ignore additional fields so we get this for free unless we go out of our way to add a check for the existence of it in the POST and manually raise an error.
- if it's already implemented & tested in Warehouse, then leave it alone - you can take it out later, some time *after* the migration - if it's implemented, but not well tested, decide what to do based on whether testing or the below idea seems like less work - if it's not implemented in Warehouse yet, that's when the key requirement becomes "don't break automated signed uploads”
It is implemented for uploads (and like all code in Warehouse it has 100% coverage) and I've had people successfully upload signatures to Warehouse with it... but then I've also had "weird" errors where people attempt to upload a signature to Warehouse and the `request.POST["gpg_signature"]` is a completely different type of object than it normally is. I have no idea *why* it was suddenly a different object for those people and that comes from ``cgi.py`` in the standard library which is... well not super easy to follow what's going on. So I'm moderately concerned about what is going to happen if more people start trying to use that, particularly since I can't really decipher what what is triggering this behavior to know how to guard against it (other than just blindly writing the code to accept the additional type I've observed with no way to reproduce the error and just hope that it's enough to fix it). To make it worse, the problem was persistent until the person in questioned reinstalled twine and it's dependencies (but I can't repro with any version of twine). It's not implemented the UI nor is the ability to associate a GPG key with a particular user implemented nor is the ability to see what GPG key a user has associated with their account implemented, and I sort of don't want to implement those for all the reasons I listed in my original post. To do that would require Nicole's time to properly design it, and her time is more limited than mine is, so if I can reduce the things she needs to do by putting in some effort on my part (like removing the code that's there, or implementing your mail solution) that seems like a good trade off to me. Some folks who uploaded GPG keys to Warehouse noticed they didn't show up in the UI and asked about it (since they had uploaded them) and they felt that either Warehouse should fully implement the GPG feature or it should get rid of it, but a half implementation isn't great for anyone.
If we drop the feature, then my suggestion for a future upload UX is to let uploads with signatures succeed, but implicitly discard the signatures, and *automatically email the maintainer explaining why their uploaded signatures aren't appearing*.
That seems reasonable.
Similarly, projects that *had* signatures uploaded should gain a database flag indicating previously uploaded keys have been discarded, such that:
1. The admin page for the project explains why the signatures are gone 2. The affected package maintainers are emailed with a list of projects they maintain from which signatures have been removed
The aim here would be to ensure that package maintainers tempted to ask "Where did my package signatures go?" are able to have that question answered *within the context of the service itself*, even if they've never heard of distutils-sig.
That seems reasonable to, and I don’t even really need to add a new database column for it (though it might make sense for performance sake) but this should be trivially query able using: SELECT EXISTS( SELECT 1 FROM release_files WHERE name = %s AND has_signature = 't' ); ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 2016-05-12 07:41:21 -0400 (-0400), Donald Stufft wrote: [...]
What do folks think? Would anyone be particularly against getting rid of the GPG support in PyPI?
We have plans[*] in the OpenStack community to start autosigning our sdist and wheel builds (and similar release artifacts we build for other package ecosystems), so that we can track provenance and integrity through part of our release pipeline. I'm hoping to have that implemented in the next few months. While also uploading these signatures to PyPI was seen as useful, we do already have another primary location we can publish detached signatures along with our release artifacts so I would probably just ignore the PyPI/twine-specific part of the work if this goes away. [*] http://specs.openstack.org/openstack-infra/infra-specs/specs/artifact-signin... -- Jeremy Stanley
On May 12, 2016, at 07:41 AM, Donald Stufft wrote:
I am aware of a single tool anywhere that actively supports verifying the signatures that people upload to PyPI, and that is Debian's uscan program. Even in that case the people writing the Debian watch file have to hardcode in a signing key into it and in my experience, when faced with a validation error it's not unusual for Debian to simply disable signature checking for that project and/or just blindly update the key to whatever the new key is.
I like that uscan provides this feature, but I don't know how many packages actually use it, either within the Debian Python teams, or in the larger Debian community. I'd like to use it more often on packages I maintain but it's kind of difficult to find your way back to an authoritative signing key. For my own packages that I also maintain in Debian, it's of course trivial, so I have that enabled for them. I sign all my package uploads to PyPI, and I mostly trust myself <wink>. If it's possible to get signing keys from PyPI, I really have no idea how to do that. The web ui doesn't at all make it obvious (to me, at least). I understand the implementation dilemma for Warehouse, but rather than ditch this feature, I'd rather see it improve by making the signing keys more discoverable and verifiable. I wonder if keybase.io could be used somehow. Or perhaps a prominent link in the package metadata pointing to a pubkey location. Then it would be up to projects to utilize these mechanisms to make their signing keys obvious, and tools like uscan can increase their usage of such features. Cheers, -Barry
On May 12, 2016, at 3:05 PM, Barry Warsaw <barry@python.org> wrote:
On May 12, 2016, at 07:41 AM, Donald Stufft wrote:
I am aware of a single tool anywhere that actively supports verifying the signatures that people upload to PyPI, and that is Debian's uscan program. Even in that case the people writing the Debian watch file have to hardcode in a signing key into it and in my experience, when faced with a validation error it's not unusual for Debian to simply disable signature checking for that project and/or just blindly update the key to whatever the new key is.
I like that uscan provides this feature, but I don't know how many packages actually use it, either within the Debian Python teams, or in the larger Debian community. I'd like to use it more often on packages I maintain but it's kind of difficult to find your way back to an authoritative signing key. For my own packages that I also maintain in Debian, it's of course trivial, so I have that enabled for them. I sign all my package uploads to PyPI, and I mostly trust myself <wink>.
If it's possible to get signing keys from PyPI, I really have no idea how to do that. The web ui doesn't at all make it obvious (to me, at least).
You know, I went and poked at it again because I knew I had see it previously, and I realized the only place you can see the GPG key is: * Your own details page. * The "Roles" page for a project that you're the maintainer or owner of. So even as implemented on PyPI the ability to record what a particular user's GPG key is, is practically worthless.
I understand the implementation dilemma for Warehouse, but rather than ditch this feature, I'd rather see it improve by making the signing keys more discoverable and verifiable. I wonder if keybase.io could be used somehow. Or perhaps a prominent link in the package metadata pointing to a pubkey location. Then it would be up to projects to utilize these mechanisms to make their signing keys obvious, and tools like uscan can increase their usage of such features.
So my response to this is, let's pretend for a minute that we have the greatest and most amazing setup for verifying that the key 0x6E3CBCE93372DCFA belongs to me. What's your next step? How do you verify that I'm allowed to release for pip? What happens if tomorrow I decide I'm no longer going to use key 0x6E3CBCE93372DCFA because it got compromised (remembering that key revocation is hilariously broken [1]). What if we add a new signing key because I'm tired of releasing pip and someone else is going to take over, what path is Debian going to take for verifying that some new key is allowed to sign for it that doesn't put "Whatever PyPI says" in the path of trust? The problem isn't just that it's a bit annoying to implement in Warehouse, but also that GPG's trust model is not really useful at all for package signing except in very specific scenarios which we don't have (basically just Linux distros or some other thing, and even then the trust model isn't that useful they just hack it with a custom trust keychain). If I find some time I might try and get some data on it, but I greatly suspect that approaching 100% of the traffic downloading the .asc files from PyPI are Bandersnatch automatically mirroring them and Debian's uscan. I tried to search codesearch.debian.net but I can't seem to make it search over multiple lines so all I can get is there are 14 packages that have the strings `pgpsigurlmangle` and `pypi` on a single line, but I suspect more people extend that out over two lines so I don't think it's a meaningful number for this decision. Overall though, I really don't think it's worth it to keep it around when the trust model is broken and the only real consumer is Debian's uscan (and even then, most of it's value is already taken care of by TLS). [1] https://www.imperialviolet.org/2012/02/05/crlsets.html - Ok it's about CRL which is a x509 thing, but the overall point applies to GPG too. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On May 12, 2016, at 04:34 PM, Donald Stufft wrote:
So my response to this is, let's pretend for a minute that we have the greatest and most amazing setup for verifying that the key 0x6E3CBCE93372DCFA belongs to me. What's your next step? How do you verify that I'm allowed to release for pip?
I'd hope that the project home page would say that. I sheepishly admit that we don't have that information on the Mailman home page, but you *could* follow the link from me (described as the lead developer) to my own home page and then grab the key from there, verified from keybase.io.
What happens if tomorrow I decide I'm no longer going to use key 0x6E3CBCE93372DCFA because it got compromised (remembering that key revocation is hilariously broken [1]). What if we add a new signing key because I'm tired of releasing pip and someone else is going to take over, what path is Debian going to take for verifying that some new key is allowed to sign for it that doesn't put "Whatever PyPI says" in the path of trust?
uscan would complain and then I'd have to try to figure out the new signing credentials. It's not wonderful, but for platform and package maintainers who care, I think it does provide value, and the signing credentials likely don't change that often. Cheers, -Barry
On May 12, 2016 4:41 AM, "Donald Stufft" <donald@stufft.io> wrote:
All in all, I think that there is not a whole lot of point to having this feature in PyPI, it is predicated a bunch of invalid assumptions (as detailed above) and I do not believe end users are actually even using the keys
are being uploaded. Last time I looked, pip, easy_install, and bandersnatch represented something like 99% of all download traffic on PyPI and none of those will do anything with the .asc files being uploaded to PyPI (other
bandersnatch just blindly mirroring it). When looking at the number of
[...] that than projects
actively using this feature on PyPI, I can see that 27931/591919 files on PyPI have the ``has_signature`` database field set to true, or roughly 4% of all files on PyPI, which roughly holds up when you look at the number of distinct projects that have ever uploaded a signature as well (3559/80429).
Numpy is one of those projects that signs uploads, and it's utterly useless like you say -- we switch release managers on a regular basis, we don't have any communication of what the key is supposed to be... there are a few edge cases where I guess it could help a little (I actually would trust a version of requests signed by your key more than I would trust one signed by a just-invented key to no provenance, and there probably are a few people out there who check keys manually), but I agree it's almost pure cargo cult security. My main concern when implementing this is how to communicate it to users, given that almost none of them understand security either, and that if this ends up on news sites like "pypi is dropping package security" then that hurts long term good will plus means confused folk showing up with pitchforks and eating up everyone's time. And honestly they would kind of have a point -- "we're getting rid of this thing that only kinda works now in favor of something amazing that doesn't exist yet" is just not a popular move. I guess the maximally time-efficient approach would be to be to just not mention gpg signatures or touch the relevant code again until TUF is ready, and then we can frame it as "we're switching to something better". Not sure how viable that is in practice though! Maybe one way to kick this down the road a bit until there's time to deal with it properly would be to leave signature upload off on warehouse and tell people that for now that's still an imported legacy pypi only feature, you can continue uploading them there but we're still evaluating what we want to do with warehouse. I guess at some point this would become the last blocker to turning off legacy pypi and then some decision will have to be made, but maybe by that point things will be clearer? Obviously you should do whatever you think will make the transition as fast and easy as possible though! -n
participants (6)
-
Barry Warsaw
-
Donald Stufft
-
Jeremy Stanley
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore