Large language models are having their Stable Diffusion moment by simonw

Share This Article

Sed ut perspiciatis unde.

The open release of the Stable Diffusion image generation model back in August 2022 was a key moment. I wrote how Stable Diffusion is a really big deal at the time.

People could now generate images from text on their own hardware!

More importantly, developers could mess around with the guts of what was going on.

The resulting explosion in innovation is still going on today. Most recently, ControlNet appears to have leapt Stable Diffusion ahead of Midjourney and DALL-E in terms of its capabilities.

It feels to me like that Stable Diffusion moment back in August kick-started the entire new wave of interest in generative AI—which was then pushed into over-drive by the release of ChatGPT at the end of November.

That Stable Diffusion moment is happening again right now, for large language models—the technology behind ChatGPT itself.

This morning I ran a GPT-3 class language model on my own personal laptop for the first time!

AI stuff was weird already. It’s about to get a whole lot weirder.

LLaMA

Somewhat surprisingly, language models like GPT-3 that power tools like ChatGPT are a lot larger and more expensive to build and operate than image generation models.

The best of these models have mostly been built by private organizations such as OpenAI, and have been kept tightly controlled—accessible via their API and web interfaces, but not released for anyone to run on their own machines.

These models are also BIG. Even if you could obtain the GPT-3 model you would not be able to run it on commodity hardware—these things usually require several A100-class GPUs, each of which retail for $8,000+.

This technology is clearly too important to be entirely controlled by a small group of companies.

There have been dozens of open large language models released over the past few years, but none of them have quite hit the sweet spot for me in terms of the following:

Easy to run on my own hardware
Large enough to be useful—ideally equivalent in capabilities to GPT-3
Open source enough that they can be tinkered with

This all changed yesterday, thanks to the combination of Facebook’s LLaMA model and llama.cpp by Georgi Gerganov.

Here’s the abstract from the LLaMA paper:

We introduce LLaMA, a collection of founda-
tion language models ranging from 7B to 65B
parameters. We train our models on trillions
of tokens, and show that it is possible to train
state-of-the-art models using publicly available
datasets exclusively, without resorting to
proprietary and inaccessible datasets. In
particular, LLaMA-13B outperforms GPT-3
(175B) on most benchmarks, and LLaMA-65B
is competitive with the best models, Chinchilla-
70B and PaLM-540B. We release all our
models to the research community.

It’s important to note that LLaMA isn’t fully “open”. You have to agree to some strict terms to access the model. It’s intended as a research preview, and isn’t something which can be used for commercial purposes.

In a totally cyberpunk move, within a few days of the release, someone submitted this PR to the LLaMA repository linking to an unofficial BitTorrent download link for the model files!

So they’re in the wild now. You may not be

Large language models are having their Stable Diffusion moment by simonw

Large language models are having their Stable Diffusion moment by simonw

Share This Article

Newsletter

LLaMA

HackTech

Leave a comment Cancel reply

Editor's Choice

Large language models are having their Stable Diffusion moment by simonw

Large language models are having their Stable Diffusion moment by simonw

Share This Article

Newsletter

LLaMA

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter