Partial replication sounds easy—just sync the data your app needs, right? But choosing an approach is tricky: logical replication precisely tracks every change, complicating strong consistency, while physical replication avoids that complexity but requires syncing every change, even discarded ones. What if your app could combine the simplicity of physical replication with the efficiency of logical replication? That’s the key idea behind Graft, the open-source transactional storage engine I’m launching today. It’s designed specifically for lazy, partial replication with strong consistency, horizontal scalability, and object storage durability.
Graft is designed with the following use cases in mind:
- Offline-first & mobile apps: Simplify development and improve reliability by offloading replication and storage to Graft.
- Cross-platform sync: Share data smoothly across devices, browsers, and platforms without vendor lock-in.
- Stateless multi-writer replicas: Deploy replicas anywhere, including serverless and embedded environments.
- Any data type: Replicate databases, files, or custom formats—all with strong consistency.
I first discovered the need for Graft while building SQLSync. SQLSync is a frontend optimized database stack built on top of SQLite with a synchronization engine powered by ideas from Git and distributed systems. SQLSync makes multiplayer SQLite databases a reality, powering interactive apps that run directly in your browser.
However, SQLSync replicates the entire log of changes to every client—similar to how some databases implement physical replication. While this approach works fine on servers, it’s poorly suited to the constraints of edge and browser environments.
After shipping SQLSync, I decided to find a replication solution more suited to the edge. I needed something that could:
- Let clients sync at their own pace
- Sync only what they need
- Sync from anywhere, including the edge and offline devices
- Replicate arbitrary data1
- All while providing strong consistency guarantees.
That didn’t exist. So I built it.
#A different approach to edge replication
If you’ve ever tried to keep data in sync across clients and servers, you know it’s harder than it sounds. Most existing solutions fall into one of two camps:
- Full replication, which syncs the entire dataset to each client—not practical for constrained environments like serverless functions or web apps.
- Schema-aware diffs, like Change Data Capture (CDC) or Conflict-free Replicated Data Types (CRDTs), which track logical changes at the row or field level—but require deep application integration and don’t generalize to arbitrary data.
Graft takes a different path.
Like full replication, Graft is schema-agnostic. It doesn’t know or care what kind of data you’re storing—it just replicates bytes2. But instead of sending all the data, it behaves more like logical replication: clients receive a compact description of what’s changed since their last sync.
At the core of this model is the Volume: a sparse, ordered collection of fixed-size Pages. Clients interact with Volumes through a transactional API, reading and writing at specific Snapshots. Under the hood, Graft persists and replicates only what’s necessary—using object storage as a durable, scalable backend.
The result is a system that’s lazy, partial, edge-capable, and consistent.
Want to try the managed version of Graft?
Join the waitlist to get early access: Sign up here →
Each of these properties deserves a closer look—let’s unpack them one by one.
#Lazy: Sync at your own pace
Graft is designed for the real world—where edge clients wake up occasionally, face unreliable networks, and run in short-lived, resource-constrained environments. Instead of relying on continuous replication, clients choose when to sync, and Graft makes it easy to fast forward to the latest snapshot.
That sync starts with a simple question: what changed since my last snapshot?
The server responds with a graft
—a compact bitset of page indexes that have changed across all commits since that snapshot3. This is where the project gets its name: a graft
attaches new changes to an existing snapshot—like grafting a branch onto a tree. They act as a guide, informing the
client which pages can be reused and which need to be fetched if needed.
Critically, when a client pulls a graft
from the server, it doesn’t receive any actual data—only metadata about what changed. This gives the client full control over what to fetch and when, laying the foundation for partial replication.
#Partial: Sync only what’s needed
When you’re building for edge environments—browser tabs, mobile apps, serverless functions—you can’t afford to download the entire dataset just to serve a handful of queries. That’s where partial replication comes in.
After a client pulls a graft
, it knows exactly what’s changed. It can use that information to determine precisely which pages are still valid and which pages need to be fetched. Instead of pulling everything, clients selectively retrieve only the pages they’ll actually use—nothing more, nothing less.
To keep things snappy, Graft supports several ways to prefetch pages:
- General-purpose prefetching: Graft includes a built-in prefetcher based on the Leap algorithm, which predicts future page accesses by identifying patterns4.
- Domain-specific prefetching: Applications can leverage domain knowledge to preemptively fetch relevant pages. For instance, if your app frequently queries a user’s profile, Graft can prefetch pages related to that profile before the data is needed.
- Proactive fetching: Clients can always fall back to pulling all changes if needed, essentially reverting to full replication. This is particularly useful for Graft workloads running on the server side.
And because Graft hosts pages directly on object storage, they’re naturally durable and scalable, creating a strong foundation for edge-native replication.
#Edge: Sync close to the action
Edge replication isn’t just about choosing what data to sync—it’s about making sure that data is available where it’s actually needed. Graft does this in two key ways.
First, pages are served from object-storage through a global fleet of edge servers, allowing frequently accessed (“hot”) pages to be cached near clients. This keeps latency low and responsiveness high, no matter where in the world your users happen to be.
Second, the Graft client itself is lightweight and designed specifically to be embedded. With minimal dependencies and a tiny runtime, it integrates into constrained environments like browsers, devices, mobile apps, and serverless functions.
The result? Your data is always cached exactly where it’s most valuable—right at the edge and embedded in your application.
But caching data on the edge brings new challenges, particularly around maintaining consistency and safely handling conflicts. That’s where Graft’s robust consistency model comes in.
#Consistency: Sync safely
Strong consistency is critical—especially when syncing data between clients that might occasionally conflict. Graft addresses this by providing a clear and robust consistency model: Serializable Snapshot Isolation.5
This model gives clients isolated, consistent views of data at specific snapshots, allowing reads to proceed concurrently without interference. At the same time, it ensures that writes are strictly serialized, so there’s always a clear, globally consistent order for every transaction.
However, because Graft is designed for offline-first, lazy replication, clients sometimes attempt to commit changes based on an outdated snapshot. Accepting these commits blindly would violate strict serializability. Instead, Graft safely rejects the commit and lets the client choose how to resolve the situation. Typically, clients will:
- Reset and replay, by pulling the latest snapshot, reapplying local transactions, and trying again.
- Globally, the data remains strictly serializable.
- Locally, the
11 Comments
smitty1e
"What are you syncing about?" https://youtu.be/0MUsVcYhERY?si=8gEMOo3nkd7SRX-Q
hank808
[flagged]
mrbluecoat
> Graft should be considered Alpha quality software. Thus, don't use it for production workloads yet.
Beta ETA?
conradev
> After a client pulls a graft, it knows exactly what’s changed. It can use that information to determine precisely which pages are still valid and which pages need to be fetched
Curious how this compares to Cloud-Backed SQLite’s manifest: https://sqlite.org/cloudsqlite/doc/trunk/www/index.wiki
It’s similar to your design (sending changed pages), but doesn’t need any compute on the server, which I think is a huge win.
brcmthrowaway
[flagged]
matlin
My ideal version of this is simple: just define the queries you want (no matter how complex) and the you'll get exactly the data you need to fulfill those queries, no more, no less. And the cherry on top would be to have your queries update automatically with changes both locally and remote in close to real-time.
That's basically what we're doing with Triplit (https://triplit.dev), be it, not with SQL–which is a plus for most developers.
canadiantim
How does this compare with Turso? I know it's mentioned in the article (mainly better support for partial replication and arbitrary schemas), but is there also a deeper architectural departure between the two projects?
Looks really good, great work!
wg0
Seems interesting. A very challenging problem to wrap your head around. Anyone working on this is exactly pushing the field forward.
I'm thinking to give it a try in one of my React Native apps that face very uncertain connectivity.
snickell
This is a really interesting project, and a great read. I learned a lot. I'm falling down the rabbit hole pretty hard reading about the "Leap" algorithm (https://www.usenix.org/system/files/atc20-maruf.pdf) it uses to predict remote memory prefetches.
It's easy to focus on libgraft's SQLite integration (comparing to turso, etc), but I appreciate that the author approached this as a more general and lower-level distributed storage problem. If it proves robust in practice, I could see this being used for a lot more than just sqlite.
At the same time, I think "low level general solutions" are often unhinged when they're not guided by concrete experience. The author's experience with sqlsync, and applying graft to sqlite on day one, feels like it gives them standing to take a stab at a general solution. I like the approach they came up with, particularly shifting responsibility for reconciliation to the application/client layer. Because reconciliation lives heavily in tradeoff space, it feels right to require the application to think closely about how they want to do it.
A lot of the questions here are requesting comparison's to existing SQLite replication systems, the article actually has a great section on this topic at the bottom: https://sqlsync.dev/posts/stop-syncing-everything/#compariso…
kiitos
The consistency model doesn't seem to make sense.
https://github.com/orbitinghail/graft/blob/main/docs/design….
> Graft clients commit locally and then asynchronously attempt to commit remotely. Because Graft enforces Strict Serializability globally, when two clients concurrently commit based on the same snapshot, one commit will succeed and the other will fail.
OK, but, the API provides only a single commit operation:
> commit(VolumeId, ClientId, Snapshot LSN, page_count, segments) Commit changes to a Volume if it is safe to do so. The provided Snapshot LSN is the snapshot the commit was based on. Returns the newly committed Snapshot on success.
So if a client commits something, and it succeeds, presumably locally, then how should that client discover that the "async" propagation of that commit has failed, and therefore everything it's done on top of that successful local commit needs to be rolled-back?
This model is kind of conflating multiple, very different, notions of "commit" with each other. Usually "commit" means the committed transaction/state/whatever is guaranteed to be valid. But here it seems like a "local commit" can be invalidated at some arbitrary point in the future, and is something totally different than an "async-validated commit"?
mhahn
I looked at using turso embedded replicas for a realtime collaboration project and one downside was that each sync operation was fairly expensive. The minimum payload size is 4KB IIRC because it needs to sync the sqlite frame. Then they charge based on the number of sync operations so it wasn't a good fit for this particular use case.
I'm curious if the graft solution helps with this. The idea of just being able to ship a sqlite db to a mobile client that you can also mutate from a server is really powerful. I ended up basically building my own syncing engine to sync changes between clients and servers.