Rémy Rakic is helping us enable the adoption of memory safe software through work to improve Rust compile times. We asked him to provide his perspective on the work in this blog post. Thank you for your partnership and contributions, Rémy!
Introduction
Over the past few months I’ve been working as part of the Rust compiler performance working group on the initiative for better build times. I’ve been working on this effort through a contract with Prossimo that was generously supported by Google.
Context
Rust compile times are very often brought up as an area needing improvement, including in the yearly community survey. Slow compile times can be barriers to adoption and improving them could therefore help broaden the language’s impact. That’s one of Prossimo’s goals: improving the potential for more memory safe software, particularly in the most critical parts of Internet infrastructure (e.g., networking, TLS, DNS, OS kernels, etc).
In my mind, Rust has historically focused more on runtime performance than compilation times, much like LLVM, one of the most important components used in Rust compilation. I feel that’s a common story for modern compilers, in both engineering and academia. Compared to some other older languages tailored to having a lightning fast single-pass compiler, this was not the most important principle in Rust’s design. The primary focus of the designers was making sure the language would offer the desired safety guarantees without compromising on the performance of programs.
Nonetheless, compile times have received a lot more attention recently and have noticeably improved over the years. There’s more work we can do though. To help move things forward, tools and processes have been adopted and refined over time to help foster a culture of performance. Examples include:
-
A suite of benchmarks for the compiler, used for every PR that is merged. Each one is benchmarked under various use-cases: in check, debug or release modes, with or without incremental compilation (with different granularity of changes), with many different hardware metrics that are recorded and graphed over time, as well as from the compiler’s internal profiling infrastructure.
-
These benchmarks can be triggered on-demand for a PR, prior to merging, in order to avoid surprises.
-
A summary of the results is posted on each merged PR, to notify the authors, reviewers, and the working group, if a PR needs attention.
-
A weekly triage process, to summarize the week’s results, and have a friendly human in the loop in case there are calls to be made: help weed out sources of noise in the results (it happens sometimes), small inconsequential regressions that can be ignored or ones that would require more work, or unforeseen performance issues requiring a revert. We also celebrate the wins!
-
These summaries are used to notify the compiler team in their weekly meeting of the recent progress, as well as the community, via This Week in Rust.
Priorities
I worked with Prossimo on the following priorities:
-
Make pipelined compilation as efficient as possible
-
Improve raw compilation speed
-
Improve support for persistent, cached and distributed builds
We start by looking for what’s slow. Looking at this holistically, from the crate-compile level, might provide new insights, especially since we’ve rarely done this before. I gathered the 1000 most popular crates from crates.io, and gathered data for complete cargo builds including dependencies. I also gathered rustc self-profiling data for a higher-level view and profiled for sources of high memory usage. All of this was done in check, debug, and release modes, with varying degrees of parallelism.
From this high level view, we could see a few promising ways to move forward:
-
Improvements to the compilation pipeline: profiling to find sources of slowness, and then find solutions and mitigations to these issues. That could be in rustc, cargo, or even rustup.
-
Improve compile times of targeted crates: if popular crates contain sources of slowness, this can in turn impact all the crates that transitively depend on it. In some situations, it’s possible to improve the crate itself in addition to the compiler and tools.
-
Preventing future slowness: analyzing, tracking, mitigating regressions and bugs (e.g., incremental compilation issues that could lead to turning the feature off, as has happened before).
-
And finally, help people achieve the above (both contributors and crate authors). It’s a common occurrence that people would like to see sources of slowness in their projects, and having the compiler display this information would help them organize or refactor their code accordingly.
Based on these findings, the compiler performance working group drafted a roadmap, updated our benchmark suite so that they stay relevant and representative of the practices people use, and developed a policy to periodically update our benchmarks so that they stay relevant. We saw new hotspots and inefficiencies in these new crates, and some surprising finds related to pipelining and scheduling, to the common presence of build scripts, and the relative importance of proc-macros.
An Overview of the Items I Worked On
The Compile-Time Function Evaluation Engine is seeing more use with the ongoing improvements and expansions to the “const” parts