This guide is for you if you are new to Llama, a free and open-source large language model. You will find some basic information and common questions.
What is Llama?
Llama (Large Language Model Meta AI) is a family of large language models (LLM). It is Meta (Facebook)’s answer to ChatGPT.
But the two company takes different paths. ChatGPT is proprietary. You don’t know the code of the model, the training data, and the training method. Llama is an open-source software. The code, training data, and the training code are out there in the public.
Llama is the first major open-source large language model. It gains instant popularity upon release. In addition to being free and open-source, it is pretty small and can be run on a personal computer. The 7-billion and 13-billion parameter models are very usable on a good consumer-grade PC.
How does Llama work?
LLama is an AI model designed to predict the next word. You can think of it as a glorified autocomplete. It is trained with text from the internet and other public dataset. Llama 2 is trained with about 2 trillion words.
You may wonder why the Llama model seems to be intelligent: It gives you sensible answers to difficult questions. It can rewrite your essay. It can give you pros and cons of certain things.
The training text was written by humans. In some sense, they are a slice of human thoughts projected on a medium. By learning how to complete a sentence, the model also learns an aspect of being human.
Does the Llama model know logic? There are two opposing views. One view is no because what the model designed to learn was correlation. It just predicts the next most probable word. Nothing more. The other view is yes. Suppose the training text is a murder story. It must learn to complete the last sentence, “The murderer is”. To predict the next word accurately, it has no choice but to learn logical deduction.
Why use LLama instead of ChatGPT?
ChatGPT is zero setup. A free version is available. Why use LLama? ChatGPT is indeed highly accessible. Here are the reasons why
- Privacy. You can use Llama locally on your own computer. You don’t need to worry about the questions you asked being stored in a company’s server indefinitely.
- Confidentiality. You may not be able to use ChatGPT for work-related queries because you are bounded by a non-disclosure agreement. You don’t have an NDA with OpenAI, after all.
- Customization. There are many locally finetuned models you can choose from. If you don’t like the answers of a model, you can switch to another one.
- Train your model. Finally, you have an opportunity to train your own model using techniques such as LoRA.
What can you do with Llama models?
You can use Llama models the same ways you use ChatGPT.
- Chat. Just ask questions about things you want to know.
- Coding. Ask for a short program to do something in a specific computer language.
- Outlines. Giving an outline of certain technical topics.
- Creative writing. Let the model write a story for you.
- Information extraction. Summarize an essay. Ask specific questions about an essay.
- Rewrite. Write your paragraph in a different tone and style.

What language does Llama support?
Mostly English. The training data is 90% English.
Other languages, including German, French, Chinese, Spanish, Dutch, Italian, Japanese, Polish, Portuguese, and others. But don’t count on them.
This means you shouldn’t use Llama for translation tasks.
What computer hardware do I need?
It depends on the model size. The following are the VRAM needed for running on a GPU card with a GPTQ model.
Model | 8-bit | 4-bit |
---|---|---|
7B | 10 GB | 6 GB |
13B | 20 GB | 10 GB |
30 GB | 40 GB | 20 GB |
70 GB | 80 GB | 40 GB |
And the followings are for GGML models. (for Mac or CPU on Windows or Linux)
Model | 4-bit qantized |
---|---|
7B | 4 GB |
13B | 8 GB |
30 GB | 20 GB |
70 GB | 39 GB |

What are quantized models?
Quantization is a method to reduce the models’ size while preserving quality. The benefit to you is the smaller size in your hard drive and requir