We’re excited that members of our community are already building applications with the WebAssembly (Wasm) version of Kuzu,
which was only released a few weeks ago!
Early adopters to integrate Kuzu-Wasm include Alibaba Graphscope, see: 1
and 2, and Kineviz, whose project will be launched soon.
In this post, we’ll showcase the potential of Kuzu-Wasm by building a fully in-browser chatbot
that answers questions over LinkedIn data using an advanced retrieval technique: Graph
Retrieval-Augmented Generation (Graph RAG). This is achieved using Kuzu-Wasm
alongside WebLLM, a popular in-browser LLM inference engine that can
run LLMs inside the browser.
A quick introduction to WebAssembly
WebAssembly (Wasm) has transformed browsers into general-purpose computing platforms.
Many fundamental software components, such as full-fledged databases, machine learning
libraries, data visualization tools, and encryption/decryption libraries, now have Wasm versions.
This enables developers to build advanced applications that run entirely in users’
browsers—without requiring backend servers. There are several benefits for building fully
in-browser applications:
- Privacy: Users’ data never leaves their devices, ensuring complete privacy and confidentiality.
- Ease of Deployment: An in-browser application that uses Wasm-based components
can run in any browser in a completely serverless manner. - Speed: Eliminating frontend-server communication can lead to a significantly faster and
more interactive user experience.
With this in mind, let’s now demonstrate how to develop a relatively complex AI application completely in
the browser! We’ll build a fully in-browser chatbot that uses graph retrieval- augmented
generation (Graph RAG) to answer natural language questions. We demonstrate this using
Kuzu-Wasm and WebLLM.
Architecture
The high-level architecture of the application looks as follows:
The term “Graph RAG” is used to refer to several techniques but in its simplest form the term
refers to a 3-step retrieval approach. The goal is to retrieve useful context from a graph DBMS (GDBMS)
to help an LLM answer natural language questions.
In our application, the additional data is information about
a user’s LinkedIn data consisting of their contacts, messages, companies the user or their contacts worked for. Yes, you can download
your own LinkedIn data (and you should, if
for nothing else, to see how much of your data they have!).
The schema of the graph database we use to model this data will be shown below momentarily. First, let’s go over the 3 steps of
Graph RAG:
- Q Q: A user asks a natural language question Q, such as “Which of my contacts work at Google?“.
Then, using an LLM, this question is converted to a Cypher query, e.g.,MATCH (a:Company)<-[:WorksAt]-(b:Contact) WHERE a.name = "Google" RETURN b
,
9 Comments
esafak
The example is not ideal for showcasing a graph analytics database because they could have used a traditional relational database to answer the same query, Which of my contacts work at Google?
srameshc
I heard about it for the first time, an embedable graph database Kuzu and even better the WASM mix and LLM.
nattaylor
This is very cool. Kuzu has a ton of great blog content on all the ways they make Kuzu light and fast. WebLMM (or in the future chrome.ai.* etc) + embedded graph could make for some great UXes
At one time I thought I read that there was a project to embed Kuzu into DuckDB, but bringing a vector store natively into kuzu sounds even better.
jasonthorsness
Don't the resource requirements from even small LLMs exclude most devices/users from being able to use stuff like this?
mentalgear
Nice! You might also want to check out Orama – which is also an open-source hybrid vector/full text search engine for any js runtime.
willguest
I absolutely love this. I make VR experiences that run on the ICP, which delivers wasm modules as smart contracts – I've been waiting for a combo of node-friendly, wasm deployable tools and webLLM. The ICP essentially facilitates self-hosting of data and provides consensus protocols for secure messaging and transactions.
This will make it super easy for me to add LLM functionality to existing webxr spaces, and I'm excited to see how an intelligent avatar or convo between them will play out. This is, very likely, the thing that will make this possible :)
If anyone wants to collab, or contribute in some way, I'm open to ideas and support. Search for 'exeud' to find more info
itissid
Since I already have a browser connected to the Internet where this would execute, could one have the option of transparently executing the webGPU + LLM in a cloud container communicating with the browser process?
nsonha
Could someone please explain in-browser inference to me? So in the context of OpenAI usage (WebLLM github), this means I will send binary to OpenAI instead of text? And it will lower the cost and run faster?
DavidPP
I'm new to the world of graph, and I just started building with SurrealDB in embedded mode.
If you don't mind taking a few minutes, what are the main reasons to use Kuzu instead?