The in-memory nature of Memgraph gives you a plethora of advantages against other database management systems in terms of performance. You do not need to issue expensive IO operations to copy your data into main memory and only after which you could do some operations on it. The data you are interested in querying is already in your server’s main memory. On the flip side, memory is a form of volatile storage. All it takes is an outage, and the data that is stored solely in memory would be wiped. You would end up with lost data upon a system crash. In order to avoid nuisances like this and system crashes, Memgraph provides you with the well-described, battle-tested process of database recovery.
What is recovery?
Recovery is the action or process of regaining possession or control over something stolen or lost. In this specific case of database recovery, it means restoring your database to the correct and consistent state it was in before the system crash. In other words, recovery refers to the restore operation of data that was lost from your memory upon the process itself being killed. This begs the question as to how does one restore something that was lost. To put it simply, it was never lost in the first place.
Memgraph takes periodic snapshots of your data and saves it into non-volatile storage. If the process stops for any given reason, you still have all your data saved in the snapshots folder which is located at var/lib/memgraph/snapshots
by default but you can change that by using the appropriate flag --data-directory
. When you start the Memgraph instance again, it will look for any valid snapshots in the above-mentioned folder, and start recreating the database in memory, based on the latest valid snapshot it could find. It is important to mention here, that snapshot creation is configurable through the flags:
– --storage-recover-on-startup
– --storage-snapshot-interval-sec
– --storage-snapshot-on-exit
– --storage-snapshot-retention-count
You can find an extended description here. As you can see the user has plenty of choices to configure the database recovery-related options however there is one slight issue.
What is the issue?
If you have a huge dataset you would like to recover after a system crash, it might take a long time to have your data back into its original form, which is a bit inconvenient. You already had to go through the unpleasantries of restarting your instance while crossing your fingers and hoping that the latest snapshot includes the previously bulk-imported updates to your graph and Atropos did not cut your update’s thread of life too short. Speaking of threads, in order to speed up the database recovery process, they could be utilized better.
Multithreading is a powerful tool provided by modern computer architectures with multi-core processors. It allows you to run separate threads of executions that are operating independently from one another, hence, parallel processing. While it sounds like a great tool — which it is — it really is a double-edged sword.
If you want to use multiple threads to update and read the same resource without the proper tools, tha