Disk I/O bottlenecks are easy to overlook when analyzing CI pipeline performance, but tools like iostat and fio can help shed a light on what might be slowing down your pipelines more than you know.
GitHub offers different hosted-runners with a range of specs, but for this test we are using the default ubuntu-22.04
runner in a private repository, which does give us an additional 2 vCPUs but does not alter the disk performance.
How to monitor disk performance
Getting a baseline benchmark from a tool like fio
is useful for comparing the relative disk performance of different runners. However, to investigate if you are hitting disk I/O bottlenecks in your CI pipeline, it is more useful to monitor disk performance during the pipeline execution.
We can use a tool like iostat
to monitor the disk while installing dependencies from the cache to see how much we are saturating the disk.
- name: Start IOPS Monitoring
run: |
echo "Starting IOPS monitoring"
# Start iostat in the background, logging IOPS every second to iostat.log
nohup iostat -dx 1 > iostat.log 2>&1 &
echo $! > iostat_pid.txt # Save the iostat process ID to stop it later
- uses: actions/cache@v4
timeout-minutes: 5
id: cache-pnpm-store
with:
path: ${{ steps.get-store-path.outputs.STORE_PATH }}
key: pnpm-store-${{ hashFiles('pnpm-lock.yaml') }}
restore-keys: |
pnpm-store-
pnpm-store-${{ hashFiles('pnpm-lock.yaml') }}
- name: Stop IOPS Monitoring
run: |
echo "Stopping IOPS monitoring"
kill $(cat iostat_pid.txt)
- name: Save IOPS Data
uses: actions/upload-artifact@v4
with:
name: iops-log
path: iostat.log
Monitoring disk during untar of Next.js dependencies
In the above test, we used iostat
to monitor disk performance while the cache action downloaded and untarred the dependencies for vercel/next.js
:
Received 96468992 of 343934082 (28.0%), 91.1 MBs/sec
Received 281018368 of 343934082 (81.7%), 133.1 MBs/sec
Cache Size: ~328 MB (343934082 B)
/usr/bin/tar -xf /home/<path>/cache.tzst -P -C /home/<path>/gha-disk-benchmark --use-compress-program unzstd
Received 343934082 of 343934082 (100.0%), 108.8 MBs/sec
Cache restored successfully
The full step took 12s to complete, and we can estimate the download took around 3s, leaving 9s for the untar operation.
The compressed tarball is only about 328MB, but after extraction, the total amount of data written to the disk is about 1.6GB. That smaller size got our cache across the network plenty fast, and most CPUs can handle decompression fast enough, meaning higher compression is often favorable. Once download and decompression are no longer the bottleneck, that leaves writing to disk.
Reading from a tarball is a fairly efficient process as it’s mostly sequential reads, however, we then need to write each file to disk. This is where we can hit disk I/O bottlenecks, especially with a large number of small files.
Itâs important to note that this is just a single run, not an average. Running multiple tests over time will give you a much clearer picture of the overall performance. Variance between runs can be quite high, so an individual bad run doesnât necessarily indicate a problem.
What this run suggests is a possible throughput bottleneck. Weâre seeing spikes in the maximum total throughput, with most hovering around ~220MB/s. This is likely the maximum throughput we are able to achieve to this disk, we’ll verify this next. We should continue to monitor this and compare it to other runners to see if we can find an ideal runner for our workflow. We’ll use fio
to double-check if we are hitting the disk’s maximum throughput.
An interesting aside before we move on, we can see from this side-by-side how relatively low read operations to writes there are. Since weâre reading from a tarball, most reads are sequential, which tends to be more efficient. That read data is likely going into a buffer before being written to the disk in a more random pattern as it creates a copy of each file. This is why we see a higher write IOPS than read IOPS.
Maximum disk throughput
One of the first optimizations developers usually make to their CI pipelines is caching dependencies. Even though the cache still gets uploaded and downloaded with each run, it speeds things up by packaging all your dependencies into one compressed file. This skips the hassle of resolving dependencies, avoids multiple potentially slow downloads, and cuts down on network delays.
But as we saw above, network speed isn’t usually our bottleneck when downloading the cache.
Test Type | Block Size | Bandwidth |
---|---|---|
Read Throughput | 1024KiB | ~209MB/s |
Write Throughput | 1024KiB | ~209MB/s |
Using fio
to test our throughput, notice that both “read” and “write” throughput are both capped at the same value. This is a fairly telling sign that the limitation here is not actually the disk physically, but rather a bandwidth limit imposed by GitHub. This is a standard practice to divide up resources among multiple users who may be accessing the same physical disk from their virtual machines. It isn’t always documented, but most providers will have higher bandwidth limits on higher tier runners.
What we measured here aligns fairly closely with the 220MB/s we saw in the untar test, giving us another hint that we are likely being slowed down during our dependency installation, not by the network or CPU, but by the disk.
Regardless of how fast our download speed is, we won’t be able to write to disk any faster than our max throughput to the disk.
Estimated time to write to disk: Select a cache payload and throughput speed
Realistically, your disk performance will vary greatly depending on your specific cache size, the number of files, and just general build-to-build variance. That’s why it’s a good idea to monitor your CI runners for a consistent baseline, and we’ll talk about testing your workflow on multiple runners for comparison.
Maximum IOPS (Input/Output Operations Per Second)
After downloading the cache tarball, it needs to be extracted. Depending on the compression level it could be a CPU-intensive operation but this isn’t usually a problem. When untar-ing the dependencies, we are performing a lot of small read and write operations, which is where we can hit disk I/O bottlenecks.
Test Type | Block Size | IOPS |
---|---|---|
Read IOPS | 4096B | ~51K |
Write IOP |
!-->
Read More
9 Comments
ValdikSS
`apt` installation could be easily sped-up with `eatmydata`: `dpkg` calls `fsync()` on all the unpacked files, which is very slow on HDDs, and `eatmydata` hacks it out.
suryao
TLDR: disk is often the bottleneck in builds. Use 'fio' to get performance of the disk.
If you want to truly speed up builds by optimizing disk performance, there are no shortcuts to physically attaching NVMe storage with high throughput and high IOPS to your compute directly.
That's what we do at WarpBuild[0] and we outperform Depot runners handily. This is because we do not use network attached disks which come with relatively higher latency. Our runners are also coupled with faster processors.
I love the Depot content team though, it does a lot of heavy lifting.
[0] https://www.warpbuild.com
miohtama
If you can afford, upgrade your CI runners on GitHub to paid offering. Highly recommend, less drinking coffee, more instant unit test results. Pay as you go.
jacobwg
A list of fun things we've done for CI runners to improve CI:
– Configured a block-level in-memory disk accelerator / cache (fs operations at the speed of RAM!)
– Benchmarked EC2 instance types (m7a is the best x86 today, m8g is the best arm64)
– "Warming" the root EBS volume by accessing a set of priority blocks before the job starts to give the job full disk performance [0]
– Launching each runner instance in a public subnet with a public IP – the runner gets full throughput from AWS to the public internet, and IP-based rate limits rarely apply (Docker Hub)
– Configuring Docker with containerd/estargz support
– Just generally turning kernel options and unit files off that aren't needed
[0] https://docs.aws.amazon.com/ebs/latest/userguide/ebs-initial…
larusso
So I had to read to the end to realize it’s a kinda infomercial. Ok fair enough. Didn’t know what depot was though.
crmd
This is exactly the kind of content marketing I want to see. The IO bottleneck data and the fio scripts are useful to all. Then at the end a link to their product which I’d never heard of, in case you’re dealing with the issue at hand.
nodesocket
I just migrated multiple ARM64 GitHub action Docker builds from my self hosted runner (Raspberry Pi in my homeland) to Blacksmith.io and I’m really impressed with the performance so far. Only downside is no Docker layer and image cache like I had on my self hosted runner, but can’t complain on the free tier.
kayson
Bummer there's no free tier. I've been bashing my head against an intermittent CI failure problem on Github runners for probably a couple years now. I think it's related to the networking stack in their runner image and the fact that I'm using docker in docker to unit test a docker firewall. While I do appreciate that someone at Github did actually look at my issue, they totally missed the point. https://github.com/actions/runner-images/issues/11786
Are there any reasonable alternatives for a really tiny FOSS project?
crohr
I'm maintaining a benchmark of various GitHub Actions providers regarding I/O speed [1]. Depot is not present because my account was blocked but would love to compare! The disk accelerator looks like a nice feature.
[1]: https://runs-on.com/benchmarks/github-actions-disk-performan…