Here we go again. I started another company. The money is in the bank.
What is the tiny corp?
The tiny corp is a computer company. We sell computers for more than they cost to make; I’ve been thinking about this one for a while. In the limit, it’s a chip company, but there’s a lot of intermediates along the way.
The human brain has about 20 PFLOPS of compute. I’ve written various blog posts about this. Sadly, 20 PFLOPS of compute is not accessible to most people, costing about $1M to buy or $100/hr to rent.
With the way AI is going, we risk large entities controlling the majority of the compute in the world. I do not want “I think there’s a world market for maybe five computers.” to ever be the world we live in.
The goal of the tiny corp is:
“to commoditize the petaflop”
What is tinygrad?
I started tinygrad in Oct 2020. It started as a toy project to teach me about neural networks, it’s now carved out a good niche in the inference space running the model in openpilot, and soon will be a serious competitor to PyTorch in many places.
The main advantage is in the tinygrad IR. It has 12 operations, all of which are ADD/MUL only. x[3]
is supported, x[y]
is not. Matrix multiplies and convolutions are just multiplies and sums, surrounded by a bunch of zero cost movement operations (like reshape, permute, expand).
# a fast matmul in tinygrad (a@b works also of course)
from tinygrad.tensor import Tensor
N = 2048; a, b = Tensor.randn(N,N), Tensor.randn(N,N)
c = (a.reshape(N,1,N) * b.permute(1,0).reshape(1,N,N)).sum(axis=2)
tinygrad is lazy, like Haskell, to allow op fusion without the user ever having to think about it.
Ok, so?
The current crop of AI chip companies failed. Many of them managed to tape out chips, some of those chips even worked. But not a single one wrote a decent framework to use those chips. They had similar performance/$ to NVIDIA, and way worse software. Of course they failed. Everyone just bought stuff from NVIDIA.
I think the only way to start an AI chip company is to start with the software. The computing in ML is not general purpose computing. 95% of models in use today (including LLMs and image generation) have all their compute and memory accesses statically computable.
Unfortunately, this advantage is thrown away the minute you have something like CUDA in your stack. Once you are calling in to Turing complete kernels, you can no longer reason about their behavior. You fall back to caching, warp scheduling, and branch prediction.
tinygrad is a simple framework with a PyTorch like frontend that will take you all the way to the hardware, without allowing terrible Turing completeness to creep in.
The Red Team (AMD)
10 or so companies thought it was a good i