For the past two weeks, I’ve been spending most of my time rewriting our CI scripts in GitHub Actions. This is the third time we’ve had to redo our CI setup—first GitHub Actions, then Earthly (which we moved away from because it was discontinued), and now, reluctantly, back to GitHub Actions.
Our CI is complex: merge queues, multiple runners (self-hosted, blacksmith.sh, GitHub-hosted), Rust builds, Docker images, and heavy integration tests. Every PR we merge burns through an hour of CI time, running across multiple parallel runners.
There are a few things we’d like to have (which we deem as “good software practice”) but it’s nothing unheard of:
- Everything that goes into `main` must pass all tests.
- Trivial mistakes (formatting, unused deps, lint issues) should be fixed automatically, not cause failures.
- The artifacts we test with in CI should be the exact ones we release.
- CI should complete quickly (to keep developers happy).
GitHub Actions technically allows all of this—but setting it up is a frustrating mess, full of hidden gotchas, inconsistent behavior, and a debugging experience that makes me question my choices.
Strange Way to Enforce Status Checks with Merge Queue
The key to enforcing a clean main
branch is GitHub’s merge queue, which rebases a PR onto main
before running CI. Sounds great. But here’s the fun part:
- We need CI to run before entering the queue to auto-fix trivial issues.
- We need CI to run again inside the queue to verify the final merge.
- GitHub Actions makes it weirdly hard to require both runs to pass.
The solution? Name the jobs identically in both phases. That’s it. GitHub treats them as the same check, so they both need to succeed. Solved by reading this answer in a Stack Overflow post after a few hours of debugging. Any other way you try to do this leads to either status checks being awaited before you put something in the queue (so it never starts the job) or worse, things just get merged even if the job you’d like to pass in the merge queue fails.
A security nightmare?
A few days ago, someone compromised a popular GitHub Action. The response? “Just pin your dependencies to a hash.” Except as comments also pointed out, almost no one does.
Even setting aside supply chain attacks, GitHub’s security model is a confusing maze to me: My point of view is that if I can’t understand a security model easily it’s probably doomed to fail or break at some point. Disclaimer: I’m writing this as a github actions user with only a vague understanding of it so I’d be delighted to hear that it is not just “things piled on top of things until it’s safe”, which is my current impression. I do understand very well that the problem of having secure CI for distributed source control is complicated.
In github, there is a “default” token called GITHUB_TOKEN
. The way it works is that it gets initialized with some default permissions. You can set that default in the settings of your repository (under Actions -> General -> Workflow Permissions). Here is what the github documentation says about it:
If the default permissions for the GITHUB_TOKEN are restrictive, you may have to elevate the permissions to allow some actions and commands to run successfully. If the default permissions are permissive, you can edit the workflow file to remove some permissions from the GITHUB_TOKEN.
– Github Documenation
Removing permission that aren’t necessary sounds nice (though I do think a better “default” would be to start with no privileges and require the user to add whatever is needed). Unfortunately, there are many of them and it’s hardly clear for all of them what they are protecting if you’re not a github expert.
Your workflow permissions also don’t really depend on the action itself. Here is an example of such an instance, I’m using softprops/action-gh-release
to
29 Comments
cresolutejw
[flagged]
lxe
Should have been zero permissions by default. The current model is a mess of global settings, workflow permissions, and job tokens that nobody understands.
jiggawatts
Azure DevOps is nearly identical, but with slightly different zoo of issues that are less well documented in public sources.
It also has the problem of not having a local dev runner for actions. The "inner loop" is atrociously slow and involves spamming your colleagues with "build failed" about a thousand times, whether you like it or not.
IMHO, a future DevOps runner system must be an open-source, local-first. Anything else is madness.
Right now we're in the "mainframe era" of DevOps, where we edit text files in baroque formats with virtually no tooling assistance, "submit" that to a proprietary batch system on a remote server that puts it into a queue… then come back after our coffee to read through the log printout.
I should buy a dot matrix printer to really immerse myself into the paradigm.
rsanheim
tldr but: don't use GitHub Actions. Its a mess, the availability is often atrocious, and the UI around it is _still_ as clunky as when they first rolled it out many years ago.
There are better solutions out there.
goosejuice
After using Gitlab CI for years and setting up some pretty complex scenarios, when I switched over to GitHub I found the UX to be pretty rough. Seems very opaque and I find the documentation to be at best hard to navigate.
Maybe it was just the pain of switching but that was my initial impression.
sepositus
Also an Earthly casualty here. Now having to look at Dagger.
peterldowns
These are all real pains, author definitely has done a lot of work in Github Actions; respect. I'm sure these notes will save a lot of people a lot of frustration in the future, since Github Actions isn't going away — it's too damn convenient.
I wonder why they chose to move back to Github Actions rather than evaluate something like Buildkite? At least they didn't choose Cloud Build.
fourteenminutes
Used to use GH actions quite a bit. At my current company we set up RWX Mint (rwx.com/mint) and haven't looked back. (disclaimer: used to work at rwx but no longer affiliated)
silisili
I worked at companies using Gitlab for a decade, and got familiar with runners.
Recently switched to a company using Github, and assumed I'd be blown away by their offering because of their size.
Well, I was, but not in the way I'd hoped. They're absolutely awful in comparison, and I'm beyond confused how it got to that state.
If I were running a company and had to choose between the two, I'd pick Gitlab every time just because of Github actions.
hn_throwaway_99
> A few days ago, someone compromised a popular GitHub Action. The response? "Just pin your dependencies to a hash." Except as comments also pointed out, almost no one does.
I used GitHub actions when building a fin services app, so I absolutely used the hash to specify Action dependencies.
I agree that this should be the default, or even the required, way to pull in Action dependencies, but saying "almost no one does" is a pretty lame excuse when talking about your own risk. What other people do has no bearing on your options here.
Pin to hashes when pulling in Actions – it's much, much safer
larusso
Interesting. I‘m also moving our CI to GitHub actions after years of using Jenkins with custom pipelines written in groovy etc. I checked out GitHub actions every now and then to feel if a move finally makes sense. I started with simple builds then tested adding our Jenkins macOS agents as self hosted runners. Just yesterday I wrote two actions to build and test a new .net project. I was able to run the whole thing with „act“ locally before running it on GitHub proper. I also played around and created a custom action in typescript (kicked off from the available predefined templates) to see how much work maintaining that means. All in all I‘m super happy and see no bigger issues. But here are some things that might be a reason:
I split CI in build system logic which should and need to run locally and just stuff that GitHub needs to execute. At best that means describing what runs in parallel, and making specific connections. Any complicated logic needs to be abstracted away behind a a setup that is itself testable. I handle it the same for our build system components. We use gradle a lot and of a few custom plugins which encapsulate specific build / automations. It’s like dividing your problem into many smaller pieces which are tested and developed in isolation.
Next to json I also used travisCI and appveyor for projects. And they all had the same (commit and pray) setup that ai hate. I wish if „act“ was a tool directly maintained by the GitHub folks though.
https://github.com/nektos/act
GauntletWizard
Whenever I get mad at GitHub Actions, I refer to it by it's true name: VisualSourceSafe Actions. Because that's what it is, and it shows. If you check out their Action Runner's source code[1], you'll find the VSS prefix all over, showing it's lineage.
[1] https://github.com/actions/runner/blob/6654f6b3ded8463331fb0…
jamesu
One thing I found useful was writing a runner for giteas actions CI which is similar to GHA. When you dig down and ask "what is ACTUALLY happening to run this job" then a lot of things such as the docker entrypoint not being modifiable make perfect sense.
yoyohello13
My team uses GitLab and most other teams are on Azure dev ops. They keep trying to get us to switch telling us how amazing pipelines are. Glad to know we are not missing anything.
itissid
I can relate to this pain. Isn't gitlab CI better at this especially the documentation and simplicity of it?
lemagedurage
I wonder if the complexity of fixing trivial code mistakes in CI is worth it compared to catching them in a pre-commit hook.
mcqueenjordan
Usually if you’re using it, it’s because you’re forced to.
In my experience, the best strategy is to minimize your use of it — call out to binaries or shell scripts and minimize your dependence on any of the GHA world. Makes it easier to test locally too.
kelseydh
We recently had a developer — while trying to debug container builds for a version upgrade for a PR on their local branch — accidentally trigger a deployment of their local branch's docker container to production (!) while messing around with Github action workflow files in their pull request (not main).
Outside of locking down edit access to the .github workflow yml files I'm not sure how vulnerabilities like this can be prevented.
silverwind
GHA is full of such obure behaviours. One I recently discovered is that one action can not trigger another:
If one action pushes a tag to the repo, `on:tag` does not trigger. The workaround apparently is to make the first action push the tag using a custom SSH key, which magically has the ability to trigger `on:tag`.
jicea
Genuine question: what's the GitLab equivalent of GitHub Actions?
I'm using GitHub Actions to easily reuse some predefined job setup (like installing a certain Python version on Linux, macOS, Windows runners). For these tyoe of tasks, I find GitHub actions very useful and convenient. If you want to reuse predefined jobs, written by someone else, with GitLab CI/CD, what can I use?
ThomasRooney
> A few days ago, someone compromised a popular GitHub Action. The response? "Just pin your dependencies to a hash." Except as comments also pointed out, almost no one does.
I'm surprised nobody has mentioned dependabot yet. It automates this, keeping action dependencies pinned by hash automatically whilst also bringing in stable upgrades.
wordofx
GHA feels like a discontinued product that people use so they can’t switch it off.
kfarr
I was updating an old action last night to update gh pages and it’s from peaceiris. And it’s not bad, it did the job. But it feels kinda weird.
jalaziz
GitHub Actions started off great as they were quickly iterating, but it very much seems that GitHub has taken its eye of the ball and the improvements have all but halted.
It's really upsetting how little attention Actions is getting these days (<https://github.com/orgs/community/discussions/categories/act…> tells the story — the most popular issues have gone completely unanswered).
Sad to see Earthly halting development and Dagger jumping on the AI train :(. Hopefully we'll get a proper alternative.
On a related note, if you're considering https://www.blacksmith.sh/, you really should consider https://depot.dev/. We evaluated both but went with Depot because the team is insanely smart and they've solved some pretty neat challenges. One of the cooler features is that their caching works with the default actions/cache action. There's absolutely no need to switch out popular third party actions in favor of patched ones.
voidr
I don't get the obsession with YAML and making things declarative that really should not be declarative.
I'm so much happier on projects where I can use the non-declarative Jenkins pipelines instead of GH Actions or BB pipelines.
These YAML pipelines are bad enough on their own, but throw in a department that is gatekeeping them and use runners as powerful as my Raspberry Pi and you have a situation where a lot of developers just give up and run things locally instead of the CI.
kylegalbraith
This was an interesting read and highlighted some of the author's top-of-mind pain points and rough edges. However, in my experience, this is definitely not an exhaustive list, and there are actually many, many, many more.
Things like 10 GB cache limits in GitHub, concurrency limits based on runner type, the expensive price tag for larger GitHub runners, and that's before you even get to the security ones.
Having been building Depot[0] for the past 2.5 years, I can say there are so many foot guns in GitHub Actions that you don't realize until you start seeing how folks are bending YAML workflows to their will.
We've been quite surprised by the `container` job. Namely, folks want to try to use it to create a reproducible CI sandbox for their build to happen in. But it's surprisingly difficult to work with. Permissions are wonky, Docker layer caching is slow and limited, and paths don't quite work as you thought they did.
With Depot, we've been focusing on making GitHub Actions exponentially faster and removing as many of these rough edges as possible.
We started by making Docker image builds exponentially faster, but we have now brought that architecture and performance to our own GHA runners [1]. Building up and optimizing the compute and processes around the runner to make jobs extremely fast, like making caching 2-10x faster without having to replace or use any special cache actions of ours. Our Docker image builders are right next door on dedicated compute with fast caching, making the `container` job a lot better because we can build the image quickly, and then you can use that image right from our registry in your build job.
All in all, GHA is wildly popular. But, the sentiment around even it's biggest fans is that it could be a lot better.
[0] https://depot.dev/
[1] https://depot.dev/products/github-actions
stephencoxza
Not sure if I'm the odd one out here. I thoroughly enjoy making the best of whatever the company wants to use. The flavour of CI/CD can be a debate similar to programming languages
suryao
There definitely are a ton of issues with GitHub actions. To add to the OP's list:
– Self-hosting on your aws/gcp/azure account can get a little tricky. `actions-runner-controller` is nice but runs your workflows within a docker container in k8s, which leads to complex handling for isolation, cost controls because of NAT etc.
– Multi-arch container builds require emulation and can be extremely slow by default.
– The cache limits are absurd.
– The macos runners are slow and overpriced (arguably, most of their runners are).
Over the last year, we spent a good amount of time solving many of these issues with WarpBuild[1]. Having unlimited cache sizes, remote multi-arch docker builders with automatic caching, and ability to self-host runners in your aws/gcp/azure account are valuable to minimize cost and optimize performance.
[1] https://warpbuild.com
lars512
At Our World In Data we ended up using Buildkite to run custom CI jobs, integrated with GitHub, but on cheap, massive Hetzner machines. I can really recommend the experience!