Sharding is a technique that distributes data and load across several standalone database instances. This method leverages horizontal scalability by splitting the original dataset into shards, which are then distributed across multiple database instances.
But, even though the verb “distributes” appears in the definition of sharding, a sharded database is not a distributed one.
Every sharding solution has one critical component in its architecture. This component can go by various names, including coordinator, router, or director:
The coordinator is the sole component aware of data distribution. It maps client requests to specific shards and then to the corresponding database instance. This is why clients must always route their requests through the coordinator.
For example, if a client wants to insert a new record into the Car
table, the request first goes to the coordinator. The coordinator maps the record’s primary key to one of the shards and then forwards the request to the database instance responsible for that shard.
In the schema above, first, the coordinator maps key 121
to shard 10
and, second, inserts the record into table car_10
that is stored on the database instance owning shard 10
However, one question remains: Why is the coordinator even needed in sharding solutions? The answer is straightforward. The shards are stored on database instances designed for single-server deployments.
These database instances do not communicate with each other, nor do they support any protocols that would facilitate such communication. Unaware of each other, they exist in their own isolated environments, oblivious to the f