Almost every company today has some kind of database as a single source of truth. Graph databases proved to be useful in almost every industry you know about. They excel in the modeling process since the graph model can naturally map business scenarios that involve interconnected data. Elasticsearch is a flexible, text-processing tool that is used primarily for full-text searching and indexing.
Although some naive text-searching solutions could be built with a graph database like Memgraph, Elasticsearch is better at offering fine-grained searching capabilities specific to text processing. A lot of people have decided to use both systems, but the problem is how to keep them in sync. In MAGE 1.6 we introduce a module that enables developers to serialize Memgraph into Elasticsearch instance using basic authentication. This blog will show you how the model was built and how to use it.
Approaches to syncing
Using Elasticsearch and Memgraph as two completely separated entities and storing the same data in an unrelated fashion is very complicated because it duplicates all processes and operations needed for synchronisation. It is also extremely cumbersome because if some update operation passes successfully on one platform but fails on the other, you are left with an inconsistent state in the system.
To consider Memgraph and Elasticsearch synced, a couple of requirements need to be met. All existing data in the database needs to be indexed and new data should be incrementally indexed the moment it is inserted into the database.
Many options could be used for meeting these requirements:
- Logstash is a server-side data processing system that allows parsing data from more than 200 sources and sending it to Elasticsearch. It also comes with an API with which you can develop a plug-in for parsing data from any custom application you could think of.
- Although not yet supported within Memgraph, Change Data Capture (CDC) would enable capturing changes made on a graph in a transaction and sending them to the ES index.
However, there is one more thing that we should really think about when syncing Memgraph and Elasticsearch and that is expandability. The challenge is to build a solution that will need the minimum amount of change when a new method is required. That is why we decided to use Memgraph’s Pythonic capabilities and create a new query module that uses Elasticsearch’s API inside Memgraph’s graph library MAGE.
If there comes a time when a new method is needed, a few lines can be easily added with custom processing logic in Python without (re)starting any processes. A subtle perf