Llama 2 7B/13B are now available in Web LLM!! Try it out in our chat demo.
Llama 2 70B is also supported. If you have a Apple Silicon Mac with 64GB or more memory, you can follow the instructions below to download and launch Chrome Canary and try out the 70B model in Web LLM.
This project brings large-language model and LLM-based chatbot to web browsers. Everything runs inside the browser with no server support and accelerated with WebGPU. This opens up a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration. Please check out our GitHub repo to see how we did it.
We have been seeing amazing progress in generative AI and LLM recently. Thanks to the open-source efforts like LLaMA, Alpaca, Vicuna and Dolly, we start to see an exciting future of building our own open source language models and personal AI assistant.
These models are usually big and compute-heavy. To build a chat service, we will need a large cluster to run an inference server, while clients sen