Why not include the public key in the package?
99% of the time what I want out of package signing is to know that the new version of the package I'm downloading is from the same people as the old version. I don't actually need to know who those people are...just that they are the same people as before.
Because PyPI (or an attacker) could always substitute a new key. There's very little value in the signature and key coming from the same source: the key (and its justified identity) always need to come from a source of trust, not the source that's being verified.
> 99% of the time what I want out of package signing is to know that the new version of the package I'm downloading is from the same people as the old version. I don't actually need to know who those people are...just that they are the same people as before.
This might be a misunderstanding, but I don't think you actually want this: lots of large packages have multiple release managers (and contributors who come and go); you don't want to manually resolve each new human identity that appears for a package distribution.
What most people actually want is a strong cryptographic attestation that the package distribution came from the same source as the thing hosting the source code, since both that service and the owner of the repository are presumed trusted.
Notably, PGP is incapable of providing either of these: you only get key IDs, which are neither strong human identities nor a strong binding to a service. Key IDs might correspond to keys with email (or other identities) in them, but that's (1) not guaranteed, and (2) not a strong proof of identity (since anybody can claim any identity in a PGP key).
Nope, you assume wrong. That's exactly what I (also) want, that is, knowing that the *authors* remained the same, whoever they are.
>> What most people actually want is a strong cryptographic attestation that the package distribution came from the same source as the thing hosting the source code
Nope, nobody really needs more of that, since that's what's your HTTPS certificate is for.
People *really* want to mitigate the risk of pypi infrastructure getting fully compromised, which is very likely, given how many eggs you keep in the same basket there.
PGP signatures were the last ditch, not very convenient but also not as bad as they are painted. But from now on there will be not even that very little.
That security is integrity, which PyPI already provides through strong cryptographic digests of each package distribution. Codesigning schemes need to provide authenticity, not just integrity; a codesigning scheme that's downgradeable to arbitrary key trust is a more complicated than necessary hashing scheme.
Isn't that what public key servers are for?
For publishing my FOSS to sonatype, I had to first publish my public key, eg keyserver.ubuntu.com.
I don't know PyPI, but from this OC, it sounds like PyPI does not have the same prerequisite.
This was discussed a bit on Sunday's thread[2], and my understanding is that Maven's ability to use PGP in this way is effectively due to Sonatype assuming a large amount of operational and maintenance burden. PyPI doesn't have those kind of resources available to it. Even assuming that the service was gifted that kind of support, it would still cause a lot of heartburn with existing signatures and carry forwards all of the legacy baggage of PGP that we're trying to eliminate entirely.
[1]: https://gist.github.com/rjhansen/67ab921ffb4084c865b3618d695...
Depends. If the distributor maintains a repository of trusted public keys (for example as repositories of Linux distributions do) it gives you a guarantee. As it's said, most of the time you just want to know that the key used to sign a package is not changed. That is the same level of security that SSH offers (first time you connect to a server saves the public key, then give an error if that public key is changed). That is really enough for a package in PyPy, or sign git commits and similar.
We should ask ourself if the complexity of PGP is needed. Probably not, as it's not needed the complexity of x509 certificates, since a simple RSA signature of the package with a public key hosted on a server would be sufficient. But PGP is practical, you have a good tooling built around it, is pretty universal, so why not?
pypi could show a warning that the key has changed. Which is not an actionable or helpful warning. And then everyone gets used to seeing these warnings every now and then. And you won nothing.
Getting signatures to do something useful is hard.
What happens if a developer loses their google titan key that is required to login into pypi?
If would be btw. a proper but sustainable prove of work blockchain. As you would need in most cases to pay developers to "mint new blocks".
OK, maybe let's forget about the blockchain. It's a loaded term. But the idea of software signature TOFU sounds indeed good!
They also say they can't "meaningfully verify" packages if the key does not have "binding identify information," by which they presumably mean automatically verifiable binding identity information, which usually means someone verified an email from keys.openpgp.org. This is a really narrow way to establish "binding identity information." For example someone who is a PyPi author and publicly links their PGP key from a (https) website on the same domain as the email on the key would not count. A well known longtime PyPi author with a well known key would not count.
The ad hoc, out of band nature of how PGP keys are trusted is not remotely new - PyPi would have known from the very start of adopting PGP that many keys would not be automatically verifiable. It makes little sense to turn around now and act like this is some surprising thing.
This has the smell of "we didn't want to bother supporting PGP any more because it's hard so we came up with an excuse."
No need for an excuse, though: Just be honest about it and let the chips fall where they may, if you really don't want to support PGP. God knows there are valid reasons for not having the energy to deal with PGP. (FWIW I think it's a good solution for packages, for those who can navigate the tooling, but on the other hand I'm not volunteering my time to run PyPi.)
P.S. There is a link in their post saying PGP has "documented issues." The specific issue described in the linked document is "packaging signing is not the holy grail" and a list of known things about PGP, like that verification of keys is ad hoc. It also concludes that there is no known better alternative.
This is revisionist: in 2005, PGP was approachingly modern and represented an acceptable tradeoff between usability, legal and patent constraints, and arms laws. It was also accompanied by a network of synchronizing keyservers and a "strong set" within the Web of Trust that, in principle, gave you transitive authenticity for artifacts. That never really worked as expected, but it's all code and infrastructure that was actually running in 2005, when PyPI chose to allow PGP signatures.
None of that is the case in 2023: PGP is 20 years behind cryptographic best practices, and has 30 years of unresolved technical debt. There is no web of trust, and the synchronizing keyserver network has been broken for years.
The argument for PGP in 2005 was that it was, to a first approximation, the best that could be done. The argument against PGP in 2023 is that, to a first approximation, it's worse than useless by virtue of providing false security guarantees.
And you say that it's been 18 years and PGP is behind best practices, but you don't describe how those best practices would better solve the package verification challenge that PyPi faces. So in the absence of an actual alternative system why not keep using PGP? Perfect is the enemy of good (IMO!).
But again, I'm not even really feeling strongly that PyPi should use PGP or not. I mostly posted just to say that they should be honest about why they are leaving it, and these seem like bad/misleading stats that for some reason they are hiding behind instead of coming out and saying they changed their mind about PGP (or new people are now running things and don't like dealing with PGP - many people would sympathize).
This may or may not be satisfying to you, but there is discussion around this, both in this thread, other threads on the internet, and PyPI's own issue tracker. The current plan is to integrate Sigstore[1] into PyPI as a more complete and modern codesigning solution. That work is progressing, and is not in a state that's meant to "replace" PGP. But that's intentional because, as the post states, nobody (to a first approximation) was using these PGP signatures anyways.
Perfect is indeed the enemy of the good; the other enemy of the good is bad things. PGP is bad; the reason I titled the original post "worse than useless" is because it takes a useless security feature (signatures that nobody verifies) and makes them actively dangerous by providing cryptographic margins that weren't even safe 25 years ago.
> But again, I'm not even really feeling strongly that PyPi should use PGP or not. I mostly posted just to say that they should be honest about why they are leaving it, and these seem like bad/misleading stats that for some reason they are hiding behind
Two things should be separated here: there's the PyPI blog post, which is written by the PyPI admins, and there's the "worse than useless" blog post, which was written by me. I am not an admin of PyPI, and it's my independent technical opinion that PGP is bad. I stand by the stats that I've included in my own post, but I do welcome specific critiques of how they're bad or misleading.
The PyPI admins can provide their own rationale, but this is my best understanding: they have known for years that PGP is bad, and have more or less tolerated it as a legacy feature because removing it was a low priority. The post I wrote two days ago was just a "final nudge" towards removing it, since the post's statistics (particularly large numbers of expired keys) refute one of the last defenses for PGP on PyPI.
In what sense? If someone signs a package with, say, a RSA key, how is that behind in some way?
>30 years of unresolved technical debt.
How can a standard for a file/message format have technical debt. PGP is dead simple. Where is this debt hidden?
OpenPGP specifies PKCS#1 v1.5 for RSA padding. Attacks on PKCS#1 v1.5 have been well understood for over 20 years[1]; every few years, someone finds a new one.
RSA itself is well-known for having weird number-theoretic problems that implementations have failed to respect, to catastrophic effects. Best practice for algorithm selection is to pick algorithms where users can't compromise the integrity of the scheme through poor public parameter selection; RSA forces the user to pick a public modulus and exponent, leading to all kinds of silly things that actually happen[2].
Edit: Correcting myself: most attacks on v1.5 padding concern encryption, not signatures. The general fragility argument remains, however.
AFAIK Debian has been working on abandoning GPG in favor of something very similar to those two. Not sure when it's going to be shipped, though.
"Everything is Terrible So What Do We Do?
Bluntly put, I don’t know for sure. This isn’t an already solved problem nor is it an easy to solve one."
https://caremad.io/posts/2013/07/packaging-signing-not-holy-...
What I'll say on PGP is the perfect is the enemy of the good. It's not a tech anyone has much fun using, but in a group setting, used regularly, I have found it can fade into the background at least. I don't want to go any further down the "is PGP good or bad" rabbit hole than that.
But if you have a better solution for package security, please do describe it here.
There's also a general consensus (not documented) that sigstore will play some kind of role here. Possibly in-toto as well?
In the 10 years since my post that you referenced, we've laid some decent plans I believe, and have just slowly been working on them, to the extent that we've been able to given our own time constraints.
That's the issue. Pretending there is a security solution in place is worse than being upfront that there is none. If you look down and notice that your seatbelt is actually made out of angel hair pasta, you might drive more carefully. Hopefully you'll also get a better car.
At best, it defeated plausible deniability for package maintainers who had avowed public keys, but then somehow signed a bad package. This wouldn't have stopped the malware from getting onto your system. It only would have led you to the hapless (but honest) package maintainer.
It didn't stop someone who is not you from generating a PGP key for Richard WM Jones, signing malware, uploading to PyPi, and then disappearing back under the rock where they live. And if you believe this system is not useless, then you also believe that at least one person out there was not dissuaded from installing that malware because "Hey, someone named Richard WM Jones went through the trouble of signing it!"
As is often the case, the value of this system depends on your threat model. I'm not too worried about someone going rogue from the tiny population of people who were using PGP correctly. But I am worried about using a platform that claimed to have signing infrastructure, when that infrastructure had no meaningful checks on who was signing.
Except that they are: PGP does not give you this kind of identity relationship. The most it can give you is an association to a key ID, which is (1) brute-forceable, and (2) not strongly bound to any actual user or machine identity.
The only thing worse than an unsecured scheme is an insecure scheme that lulls users into a false sense of security and authenticity. PGP signatures on PyPI are the latter.
People are continuously creating better tools for domains that historically saw PGP usage. To name a few: Signal for short-form messaging, age for file encryption, signify/minisign for artifact signing.
That said, if PGP signatures are to be replaced then there's no reason why they can't be removed now and replaced with something later.
(Note: PyPI protects against MITM with HTTPS.)
Removing this is predicated on the idea that is a low priority threat vector.
Why do we use GPG ASC signatures instead of just a checksum over the same channel?
Could you elaborate on what you mean by this? PyPI computes and supplies a digest for every uploaded distribution, so you can already cross-check integrity for any hosted distribution.
GPG was nominally meant to provide authenticity for distributions, but it never really served this purpose. That's why it's being removed.
You can include an md5sum or a sha512sum string next to the URL that the package is downloaded from (for users to optionally check after downloading a package); but if that checksum string is uploaded over the same channel (HTTPS/TLS w/ a CA cert bundle) as the package, the checksum string could have been MITM'd/tampered with, too. A cryptographically-signed checksum can be verified once the pubkey is retrieved over a different channel (GPG: HKP is HTTPS/TLS with cert pinning IIRC), and a MITM would have to spend a lot of money to forge that digital publisher signature.
Twine COULD/SHOULD download uploads to check the PyPI TUF signature, which could/should be shipped as a const in twine?
And then Twine should check publisher signatures against which trusted map of package names to trusted keys?
2) the client uploads a cryptographic signature (made using their own key) along with the package, and the corresponding public key is trusted to upload for that package name, and the client retrieves said public key and verifies the downloaded package's cryptographic signature before installing.
FWIU, 1 (PyPI signs uploads with TUF) was implemented, but 2 (users sign their own packages before uploading the signed package and signature, (and then 1)) was never implemented?
To the best of my knowledge, the current state of TUF for PyPI is that we performed a trusted setup ceremony for the TUF roots[1], but that no signatures were ever produced from those roots.
For the time being, we're looking at solutions that have less operational overhead: Sigstore[2] is the main one, and it uses TUF under the hood to provide the root of trust.
Of course, if you haven't put any effort in system to end-to-end verify whether it's right signature it doesn't matter.
There is no e2e: pypi signs what's uploaded.
(Noting also that packages don't have to be encrypted in order to have cryptographic signatures; only the signature is encrypted, not the whole package)
* get a signature for author ("the actual author published it") + some metadata with list of valid signing keys (in case project have more authors or just for key rotation * get a signature for hosting provider that confirms "yes, that actual user logged in and uploaded the package" * (the hardest part) key management on client side so the user have to do least amount of work possible in when downloading/updating valid package.
If user doesn't want to go to effort to validate whether the public key of author is valid so be it but at very least system should alert on tampering with the provider (checking the hosting signature) or the author key changing (compromised credentials to the hosting provider).
It still doesn't prevent "the attacker steals key off author's machine" but that is by FAR the rarest case and could be pretty reasonably prevented by just using hardware tokens. Hell, fund them for key contributors.
I don't know if it applies to any of those 1069 keys, but note that there is a way of hosting PGP keys that does not depend on key servers: WKD https://datatracker.ietf.org/doc/draft-koch-openpgp-webkey-s... . You host the key at a .well-known URI under the domain of the email address. It's a draft as you can see, but I've seen a few people using it (including myself), and GnuPG supports it.
More generally, these kinds of improvements are not a sufficient reason to retain PGP: even with a sensible identity binding and key distribution, PGP is still a mess under the hood. The security of a codesigning scheme is always the weakest signature and/or key, and PGP's flexibility more or less ensures that that weakest link will always be extremely weak.
PyPI's support for PGP is very old -- it's hard to get an exact date, but I think it's been around since the very earliest versions of the index (well before it was a storing index like it is now). If I had to guess (speculate wildly), my guess would be that the original implementation was done with a healthy SKS network and strong set in mind -- without those things, PGP's already weak identity primitives are more or less nonexistent with just signatures.
That said, I do agree with your premise that the limited usefulness of PGP signing doesn't necessitate removing the feature entirely.
That assumes there’s a baby in the bath water.
> But instead of improving security measures that don't work well they just remove them?
Well yes, “security measures” which don’t work are usually worse than nothing.
Having a slight barrier to entry which is essentially "you must learn why signing is important for users of your library and this is how to do it", a) really isn't that bad and b) doesn't result in less quality packages being uploaded c) if it acts like any sort of filter that seems to be a good thing.
Maven Central isn't short of high quality packages and no high quality OSS Java libraries are missing so the filter aspect isn't culling anything important.
Java, Apt, RPM, etc all have this and have absolutely gigantic numbers of packages so the argument that it's too hard really just doesn't hold water.
Doing so requires reading/understanding these ~3 pages of docs: https://central.sonatype.org/publish/requirements/gpg/
Python (1991) is older than Java (1995)
(irrelevant factoid, but still ...)
So while it might not be providing meaningful security for lower-tier packages it's definitely doing it's job for top tier packages like these that are relied on by hundreds of thousands of projects.
it's the magic combination of pushing their own agenda (vs. that of their users), mixed with ineptitude
It's also not accurate to say that PyPI failed to make 2FA useful: it was deployed for over two years before the 2FA mandate for critical projects went into effect. That mandate also came with free hardware keys for everyone affected.
There is no immediate replacement, because the overwhelming majority of packages never bothered to sign with PGP (and all evidence points to the overwhelming majority of signatures never being verified). In other words, this is much closer to removing "dead" code than to killing an active feature.
Longer term, the plan is to integrate Sigstore[1]-based signatures.
How an OIDC identity is obtained and secured is not treated. It brings useful organization to PKI, but the problem remains. You have to delegate trust to identity providers: Google, GitHub, etc.
Keybase was interesting, but the project seems semi-dead.
Yes, this is a fundamental (and, IMO, reasonable) assumption in Sigstore. The trust argument for large IdPs is that they (1) have the institutional ability and resources (like incident response) to maintain their service, (2) have strong incentives to maintain and improve the overall security of their providers (billions of accounts on the Internet are bound to SSO via Google, etc.), and (3) that any failures in those providers are already catastrophic, so reducing the number of moving and potentially failing parts is a net win in terms of security.
so...*reject those packages*. if you use a PGP key that isn't properly available or verifiable, reject it. That way every package with a PGP key will have 100% "key is properly discoverable" rate.
it's not really reasonable to just drop this feature because most packages don't use it. Packages with tens of millions of downloads (like mine) make up a small percentage of total packages, but this small number of packages makes up a huge proportion of actual downloads, and package signing is most useful for these kinds of packages.
if the adoption of "proper PGP keys" were ranked by packages/ downloads rather than "packages" alone, these rates would be much different.
Looking at the top 20 packages in the last month by download (packages with hundreds of millions of downloads), only 1 of them shipped a GPG signature with their most recent release. I haven't asked the author of that one, but I do know them and I suspect they agree with the idea that it's not a valuable thing and they do it largely because it exists.
That’s me. I used to upload signatures to PyPI only because it’s a thing that exists and it’s not much trouble. I’d be counted among the valid 36%, but I doubt anyone ever verified even one of the hundreds of sigs I uploaded over the years. I eventually stopped due to the pointlessness.
If the repo requires a GPG signature, they could also ask for the public key of the developer making the releases (e.g. when they make the account), and they could sign it with their key at that point.
Then make available the package, the signature, and the signed public key. Then I only need to trust the repo's key (in this case PyPi).
Does this make any sense?
It makes sense in terms of trusting the package index, but it's inverted from the original design goal: the point of end-user signatures on package indices is to eliminate unnecessary package index trust, not reinforce it.
If you already trust the package index, then mandating HTTPS and strong cryptographic digests is going to be far more effective (and secure) than some kind of PGP key attestation scheme.
Without an easy way to verify the keys, the signatures are useless. Which is why PiPy is removing the GPG keys all together.
I know that; the GP is describing a countersigning scheme, where the package index (qua trusted entity) countersigns for the signing key, which the dev then uses to sign for their package.
> Without an easy way to verify the keys, the signatures are useless. Which is why PiPy is removing the GPG keys all together.
Agreed entirely; I'm the one who wrote the analysis in the linked announcement :-)
Sigstore [0] on the other hand makes more sense to use instead of problem.
I don't know much about the solution you promote, but as usual with many "PGP killers" it replaces one very specific application of PGP and ignores all the others. Which is ok! Doing one thing and doing it well is the Unix philosophy after all. But it's not something I have use for, and it's not a viable replacement for GPG.
We will instead switch to use some thing with a fluffy corporate website that tells absolutely nothing.
Even if only 37% of keys are verifiable, that's infinitely more than will be verifiable if they remove the PGP support.
> While it doesn't represent a massive operational burden to continue to support it, it does require any new features that touch the storage of files to be made aware of and capable of handling these PGP signatures, which is a non zero cost on the maintainers and contributors of PyPI.
Discoverable. That does not really verify anything about the key, its identities or the supposed signer.
It boils down to almost entirely to just an overcomplicated hashing system.
The post gave them excellent additional justification to.
PyPI's admins have been wanting to remove PGP support for years; all I did was provide the final nudge.
FWIW, Ruby also did a piss-poor job of handling gem signing by making it both difficult and optional.
How fucking hard is it to get to the level of code release assurance as Debian or Fedora? Manage GPG keys, signfest them, and enforce a policy.
Why on earth wasn't the community asked before you implemented this change?
> Given all of this, the continued support of uploading PGP signatures to PyPI is no longer defensible. While it doesn't represent a massive operational burden to continue to support it, it does require any new features that touch the storage of files to be made aware of and capable of handling these PGP signatures, which is a non zero cost on the maintainers and contributors of PyPI.
This uninformed reasoning is what's indefensible.
One to compile a list of file hashes and PGP-sign them.
One to validate these hashes against the provided signatures.