Programming, philosophy, pedaling.
May 21, 2023
Tags:
TL;DR: A large number of PGP signatures on PyPI can’t be correlated to any well-known
PGP key and, of the signatures that can be correlated, many are generated from weak keys or
malformed certificates. The results suggest widespread misuse of GPG and other PGP implementations by Python
packagers, with said misuse being encouraged by the PGP ecosystem’s poor defaults, opaque
and user-hostile interfaces, and
outright dangerous recommendations.
Preword
I’ve been sitting on this post for a few months, in part because of travel
and in part because its (intended) scope was beginning to reflect PGP’s own fractal complexity.
The version that I’m publishing now has been significantly pared down to remove extended
digressions on how bad PGP’s packet format is, all the different ways in which a signature or
certificate packet can be broken, incorrectly bound, &c.
I’ve removed those things because I think the results, as present, are sufficient evidence
for the actual claims I’d like to make, namely:
-
That existing PGP signatures on PyPI serve no security purpose, and that all evidence
points to nobody ever attempting to verify them; -
Even advanced technical communities, as a whole, largely fail to reduce PGP’s complexity
and unnecessary agility into a reasonable and tractable subset.
And, just in case it needs to be said:
-
This post isn’t intended to disparage PyPI: PyPI has done everything right, including
purposely removing frontend support for PGP years ago. -
This post isn’t intended to disparage individual packagers and maintainers still uploading
signatures to PyPI. I suspect that much of the ongoing signature uploading is a result
of long-forgotten automation and, even when it isn’t: developers cannot be blamed for
their misuse of obtuse tools. Security tools, especially cryptographic ones, are
only as good as their least-informed1 and most distracted user.
Background
PyPI has supported PGP signatures in some form or another for a very long time2.
To this date, PGP is still (minimally) supported: package uploaders can still sign for their package
distributions and upload the resulting .asc
to PyPI for inclusion in the index. The
official uploading utility even supports invoking
gpg
directly via the --sign
and --sign-with
arguments!
To a novice Python programmer looking to publish their first package to PyPI, this might give the
following impressions:
- That PGP offers secure and modern cryptographic primtives;
- That PyPI encourages users to upload PGP signatures or that doing so is best practice;
- That others expect PGP signatures, and that package adoption is (in part) predicated
on supplying PGP signatures.
The first two are just wrong:
-
PGP is an insecure and
outdated ecosystem that hasn’t reflected
cryptographic best practices
in decades. -
PyPI’s support is vestigial in nature: signatures are not shown as part of the web interface,
and are only obliquely referenced in the PEP 503 and JSON
APIs.
The third is harder to immediately refute: PyPI still hosts signatures, after all. Absent any
other information, it’s entirely possible that companies and end users are quietly and diligently
verifying whatever signatures are present, using trust sets, tracking revoked and expired keys,
and so forth.
Thus, my goal with this blog post:
- Determine how many signatures are on PyPI;
- Correlate those signatures to their signing keys;
- Analyze those signing keys for their practical value: their strength, liveness, &c.
Methodology
Relatively early in the process I decided not to collect every single signature on PyPI,
for two main reasons:
-
Relevance: PyPI hosts many old package distributions, including distributions
for Python 2.7 (and earlier!). Given that Python 2 has been EOL for over three years at
this point, it didn’t feel relevant (or efficient) to retrieve large quantities of
signatures that nobody is likely to ever try install the distributions for. -
Fairness: both PGP and Python have a lot of history, much of which predates
modern understandings around cryptographic best practices.
Given that, it didn’t feel fair to analyze extremely old
signatures, especially if doing so would bias the statistics away from newer users
who are doing more responsible things.
Given these considerations, I decided to limit my analysis to only signatures uploaded to PyPI
on or after 2020-03-27. I chose that date somewhat arbitrarily3 while
also satisfying a few constraints:
-
It’s well after the 2018 deployment of the new PyPI,
which didn’t emphasize support for PGP signatures (while still retaining it). In other words:
signatures uploaded in 2020 or later were either done by automation (implying some degree
of sophistication) or were likely a conscious decision by a packager to continue signing
with PGP. -
It’s very recent, and best practices around digital signatures have not changed
substantially since 2020. In other words: a best-practices signature (and key) made in 2020
should look very similar to a best-practices signature (and key) made in 2023, and someone
signing in 2020 would have no good excuses for not making reasonable choices.
Actually retrieving the signatures was a multi-step process. To start, I used
PyPI’s BigQuery dataset
to give me some basic metadata on every distribution file with an associated signature:
1
2
3
4
SELECT name, version, filename, python_version, blake2_256_digest
FROM `bigquery-public-data.pypi.distribution_metadata`
WHERE has_signature
AND upload_time > TIMESTAMP("2020-03-27 00:00:00")
This produced 52900 distributions uploaded since 2020-03-27 for which PyPI also
had a signature (subtract 1 for the CSV header):
1
2
3
4
5
6
$ wc -l inputs/dists-with-signatures.csv
52901 inputs/dists-with-signatures.csv
$ head -2 inputs/dists-with-signatures.csv
name,version,filename,python_version,blake2_256_digest
pantsbuild.pants.testutil,1.30.0,pantsbuild.pants.testutil-1.30.0-py36.py37.py38-none-any.whl,py36.py37.py38,7ecbe47906ddbe8a2f1ee2505c2edb7f9313348d4925855e429be1d316660a00
From here, I needed to retrieve each release distribution’s detached signature, i.e.
the adjacent .asc
URL in PyPI’s object storage.
I initially did this with the “conveyor” service, which turns
P