Paper Notes: Bitcask – A Log-Structured Hash Table for Fast Key/Value Data by greghn

Share This Article

Sed ut perspiciatis unde.

As part of this post, I will cover the research paper for Bitcask and do a code walkthrough of an implementation that I wrote using Java. Papers like these are concise enough to give a high-level idea of the technology and at the same time provide the required pieces that can be used to build a working implementation. This kind of approach has helped me to go down another layer in terms of understanding and I hope you too will gain something useful out of this post related to the internal workings of a storage engine.

Bitcask originates from a NoSQL key-value database known as Riak. Each node in a Riak cluster uses a pluggable in-memory key-value storage. Few of the goals that this key-value based storage aims to achieve are:

Low latency for read and write operation
Ability to recover from crash and recover with minimal latency
Easy to understand data format
Easy to backup data contents available in storage

Bitcask ends up achieving above requirements and ends up with a solution that is easy to understand and implement. Let us dive into the core design of Bitcask. In parallel, we will also look into the code implementation of major components that form Bitcask.

Key-Value Store Operations

From the data format’s perspective, Bitcask is very easy to understand. Consider Bitcask as a directory in your file-system. All the data resides in this directory and only one process is allowed to write contents into the directory at a time. There is one active file at a time to which data is being appended. So processing a write request is as simple as appending a record entry to this file. As you are not updating contents of file and just appending contents to it, the write request is processed with minimal latency.

Once the size of active file reaches a certain threshold, a new active file is created in the same directory and the previous active file is now considered immutable. Contents can be read from this file but there are no modifications performed on this file. On the code level it looks something like as below:

Another instance where a new file gets created is when the database server is shut down and then re-established. So in other words, as soon as the database server goes down, the active file is transferred to immutable status and when the server comes back up, it starts working with a new file marked as active.

The FileRecord that we ar

Paper Notes: Bitcask – A Log-Structured Hash Table for Fast Key/Value Data by greghn

Paper Notes: Bitcask – A Log-Structured Hash Table for Fast Key/Value Data by greghn

Share This Article

Newsletter

HackTech

Leave a comment Cancel reply

Editor's Choice

Paper Notes: Bitcask – A Log-Structured Hash Table for Fast Key/Value Data by greghn

Paper Notes: Bitcask – A Log-Structured Hash Table for Fast Key/Value Data by greghn

Share This Article

Newsletter

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter