Low-rank adaptation (LoRA) is a technique for fine-tuning models that has some advantages over previous methods:
- It is faster and uses less memory, which means it can run on consumer hardware.
- The output is much smaller (megabytes, not gigabytes).
- You can combine multiple fine-tuned models together at runtime.
Last month we blogged about faster fine-tuning of Stable Diffusion with LoRA. Our friend Simon Ryu (aka @cloneofsimo) applied the LoRA technique to Stable diffusion, allowing people to create custom trained styles from just a handful of training images, then mix and match those styles at prediction time to create highly customized images.
Fast-forward one month, and we’re seeing LoRA being applied elsewhere. Now it’s being used to fine-tune large language models like LLaMA. Earlier this month, Eric J. Wang released Alpaca-LoRA, a project which contains code for reproducing the Stanford Alpaca results using PEFT, a library that lets you take various transformers-based language models and fine-tune them using LoRA. What’s neat about this is that it allows you to fine-tune models cheaply and efficient on modest hardware, with smaller (and perhaps composable) outputs.
In this blog post, we’ll show you how to use LoRA to fine-tune LLaMA using Alpaca training data.
Prerequisites
- GPU machine. Thanks to LoRA you can do this on low-spec GPUs like an NVIDIA T4 or consumer GPUs like a 4090. If you don’t already have access to a machine with a GPU, check out our guide to getting a GPU machine.
- LLaMA weights. The weights for LLaMA have not yet been released publicly. To apply for ac