- Author
-
- Name
- Kurt Mackey
- @mrkurt
-
@mrkurt

We’re building a public cloud, on hardware we own. We raised money to do that, and to place some bets; one of them: GPU-enabling our customers. A progress report: GPUs aren’t going anywhere, but: GPUs aren’t going anywhere.
A couple years back, we put a bunch of chips down on the bet that people shipping apps to users on the Internet would want GPUs, so they could do AI/ML inference tasks. To make that happen, we created Fly GPU Machines.
A Fly Machine is a Docker/OCI container running inside a hardware-virtualized virtual machine somewhere on our global fleet of bare-metal worker servers. A GPU Machine is a Fly Machine with a hardware-mapped Nvidia GPU. It’s a Fly Machine that can do fast CUDA.
Like everybody else in our industry, we were right about the importance of AI/ML. If anything, we underestimated its importance. But the product we came up with probably doesn’t fit the moment. It’s a bet that doesn’t feel like it’s paying off.
If you’re using Fly GPU Machines, don’t freak out; we’re not getting rid of them. But if you’re waiting for us to do something bigger with them, a v2 of the product, you’ll probably be waiting awhile.
What It Took
GPU Machines were not a small project for us. Fly Machines run on an idiosyncratically small hypervisor (normally Firecracker, but for GPU Machines Intel’s Cloud Hypervisor, a very similar Rust codebase that supports PCI passthrough). The Nvidia ecosystem is not geared to supporting micro-VM hypervisors.
GPUs terrified our security team. A GPU is just about the worst case hardware peripheral: intense multi-directional direct memory transfers
(not even bidirectional: in common configurations, GPUs talk to each other)
with arbitrary, end-user controlled computation, all operating outside our normal security boundary.
We did a couple expensive things to mitigate the risk. We shipped GPUs on dedicated server hardware, so that GPU- and non-GPU workloads weren’t mixed. Because of that, the only reason for a Fly Machine to be scheduled on a GPU machine was that it needed a PCI BDF for an Nvidia GPU, and there’s a limited number of those available on any box. Those GPU servers were drastically less utilized and an thus less cost-effective than our ordinary servers.
We funded two very large security assessments, from Atredis and Tetrel, to evaluate our GPU deployment. Matt Braun is writing up those assessments now. They were not cheap, and they took time.
Security wasn’t directly the biggest cost we had to deal with, but it was an indirect cause for a subtle reason.
We could have shipped GPUs very quickly by doing what Nvidia recommended: standing up a standard K8s cluster to schedule GPU jobs on. Had we taken that path, and let our GPU users share a single Linux kernel, we’d have bee