Welcome to the April 2023 report from the Reproducible Builds project!
In these reports we outline the most important things that we have been up to over the past month. And, as always, if you are interested in contributing to the project, please visit our Contribute page on our website.
General news
Trisquel is a fully-free operating system building on the work of Ubuntu Linux. This month, Simon Josefsson published an article on his blog titled Trisquel is 42% Reproducible!. Simon wrote:
The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.
Simon wrote another blog post this month on a new tool to ensure that updates to Linux distribution archive metadata (eg. via apt-get update
) will only use files that have been recorded in a globally immutable and tamper-resistant ledger. A similar solution exists for Arch Linux (called pacman-bintrans
) which was announced in August 2021 where an archive of all issued signatures is publically accessible.
Joachim Breitner wrote an in-depth blog post on a bootstrap-capable GHC, the primary compiler for the Haskell programming language. As a quick background to what this is trying to solve, in order to generate a fully trustworthy compile chain, trustworthy root binaries are needed… and a popular approach to address this problem is called bootstrappable builds where the core idea is to address previously-circular build dependencies by creating a new dependency path using simpler prerequisite versions of software. Joachim takes an somewhat recursive approach to the problem for Haskell, leading to the inadvertently humourous question: “Can I turn all of GHC into one module, and compile that?”
Elsewhere in the world of bootstrapping, Janneke Nieuwenhuizen and Ludovic Courtès wrote a blog post on the GNU Guix blog announcing The Full-Source Bootstrap, specifically:
[…] the third reduction of the Guix bootstrap binaries has now been merged in the main branch of Guix! If you run
guix pull
today, you get a package graph of more than 22,000 nodes rooted in a 357-byte program—something that had never been achieved, to our knowledge, since the birth of Unix.
More info about this change is available on the post itself, including:
The full-source bootstrap was once deemed impossible. Yet, here we are, building the foundations of a GNU/Linux distro entirely from source, a long way towards the ideal that the Guix project has been aiming for from the start.
There are still some daunting tasks ahead. For example, what about the Linux kernel? The good news is that the bootstrappable community has grown a lot, from two people six years ago there are now around 100 people in the
#bootstrappable
IRC channel.
Michael Ablassmeier created a script called pypidiff as they were looking for a way to track differences between packages published on PyPI. According to Micahel, pypidiff “uses diffoscope to create reports on the published releases and automatically pushes them to a GitHub repository.” This can be seen on the pypi-diff GitHub page (example).
Eleuther AI, a non-profit AI research group, recently unveiled Pythia, a collection of 16 Large Language Model (LLMs) trained on public data in the same order designed specifically to facilitate scientific research. According to a post on MarkTechPost:
Pythia is the only publicly available model suite that includes models that were trained on the same data in the same order [and] all the corresponding data and tools to download and replicate the exact training process are publicly released to facilitate further research.
These properties are intended to allow researchers to understand how gender bias (etc.) can affected by training data and model scale.
Back in February’s report we reported on a series of changes to the Sphinx documentation generator that was initiated after attempts to get the alembic
Debian package to build reproducibly. Although Chris Lamb was able to identify the source problem and provided a potential patch that might fix it, James Addison has taken the issue in hand, leading to a large amount of activity resulting in a proposed pull request that is waiting to be merged.
WireGuard is a popular Virtual Private Network (VPN) service that aims to be faster, simpler and leaner than other solutions to create secure connections between computing devices. According to a post on the WireGuard developer mailing list, the WireGuard Android app can now be built reproducibly so that its contents can be publicly verified. According to the post by Jason A. Donenfeld, “the F-Droid project now does this verification by comparing their build of WireGuard to the build that the WireGuard project publishes. When they match, the new version becomes available. This is very positive news.”
Author and public speaker, V. M. Brasseur published a sample chapter from her upcoming book on “corporate open source strategy” which is the topic of Software Bill of Materials (SBOM):
A software bill of materials (SBOM) is defined as “…a nested inventory for software, a list of ingredients that make up software components.” When you receive a physical delivery of some sort, the bill of materials tells you what’s inside the box. Similarly, when you use software created outside of your organisation, the SBOM tells you what’s inside that software. The SBOM is a file that declares the software supply chain (SSC) for that specific piece of software. […]
Several distributions noticed recent versions of the Linux Kernel are no longer reproducible because the BPF Type Format (BTF) metadata is not generated in a deterministic way. This was discussed on the #reproducible-builds
IRC channel, but no solution appears to be in sight for now.
On our mailing list this month:
-
Larry Doolittle shared an interesting puzzle with the group where three bytes in a
.zip
file were different between two builds. -
Alexis PM wrote a message as they had observed a difference between binaries available in the Debian archive and the ones on tests.reproducible-builds.org. The thread generated a number of replies, including interesting responses from Vagrant Cascadian […]