TL;DR I’ve been working on a WebGPU optimized inference and autograd library called webgpu-torch with an API that matches PyTorch. The goal is to run neural networks in the browser at speeds comparable to a Linux workstation. Many kernels have been implemented and its design is easily extensible. It’s available on NPM now and works in both the browser and Node.js!
Neural Networks in the Browser
Nine months ago, I got Hugging Face Transformers (Large Language Models like GPT but a wee bit smaller) working in the browsers thanks to the ONNX web runtime and some painfully hand-coded tokenizers.
It’s quite liberating running these nets in the browser since the web is the best software distribution platform ever created. You can just send someone a link and they can run your code. No need to install anything. No need to worry about what OS they’re running. No need to worry about what hardware they have. It’s all just there.
The only problem is that ONNX is a wee bit, shall we say, slow.
Thankfully, WebGPU has arrived in browsers and we can now properly access the GPU to write optimized kernels for neural network operations. This is a huge deal. It means we can now run neural networks in the browser at speeds comparable to NVIDIA/CUDA.
Someone just needs to, you know, do the hard work of implementing all those operations for the GPU.
Well that’s what I’m very pleased to announce I’ve been working on for the past few months. I’ve been re-implementing PyTorch in TypeScript for WebGPU.
What is a PyTorch?
PyTorch is a wrapper over the torch runtime (which I first used with Lua) for performing neural network operations. It’s a very popular library for doing AI work and seems to have won the arms race for now.
The library is broken up into parts:
-
An optimized (for GPU) math library supporting element-wise operations, matrix multiplication, convolutions, reductions, etc. over tensors.
-
An automatic differentiation library (autograd) that is just a lot of bookkeeping to keep track of the operations performed on tensors so that gradients can be calculated.
-
A neural network library that is just a bunch of l