Authors: Mostafa Abdelraouf, Zain Kabani, Andrew Tanner
In this post, we’ll be talking about PgCat, an open-source Postgresql Proxy that we have been using in production and contributing to. It provides connection pooling, load-balancing, and replica failover for our Postgresql clusters.
At Instacart, we use Postgresql for the vast majority of our database needs. We can squeeze a lot of performance out of a single instance of Postgres with effective caching, indexing, query optimization and vertically scaling instances. All these strategies are great but they can only go so far. We can scale beyond one instance by adding more read replicas or horizontally sharding the database. Both techniques, however, come at the cost of added complexity.
For example, when using read replicas, the application needs to know how many replicas there are, it needs to be able to effectively load-balance between them, route traffic away from a degraded replica, and understand the concept of replication lag.
Historically, we have solved these problems on the application side using libraries such as Makara, Makara is a Ruby gem that operates as an ActiveRecord connection adapter that handles load balancing and failover. However, Makara comes with its own set of shortcomings, and more importantly, it is a Ruby-only solution. Ideally, we want to be able to solve this problem for all clients regardless of language.
The proxy layer that sits between our clients and the database is the ideal position to implement these functions because communication to and from the proxy layer is done using the Postgres protocol which is language-agnostic. At Instacart, this proxy layer has been occupied by the venerable PgBouncer. Pgbouncer does connection pooling very well but does not support replica failover and has limited support for load balancing.
Because of these limitations with PgBouncer, we decided to explore other options and concluded that PgCat is well-suited to meet our needs into the future. We’ll explain why we decided to use PgCat, how it compares to PgBouncer, and some of the features we developed for it.
As we said, we want a Postgresql proxy that goes beyond connection pooling. So in addition to being at feature parity with PgBouncer, we would like the proxy to handle load-balancing and replica failover. We also want to be able to safely extend the proxy with more features in the future.
We evaluated a few options. Below is a summary of the features provided by each of these options (Pgpool was eliminated from contention for its lack of support for transaction mode):
During the course of our investigation, we learned from our friends at PostgresML about the new Postgres proxy they are working on called PgCat. It supported the majority of the features we want in addition to being written in Rust. Being written in Rust is an upside because the memory safety guarantees that Rust provides allow us to build concurrent features without sacrificing safety or speed.
When we were introduced to PgCat it was in beta, and it was missing some features we care about such as multiple pools per instance and graceful shutdown. We contributed these features to PgCat and started testing the proxy in our sandbox environments and later in our production environments. Maintaining parity with Pgbouncer in terms of latency and correctness was especially important for us.
Latency
We tested the latency by configuring a service to use PgCat as one replica, PgBouncer as another replica, and relied on a client-side library to balance the load between the two. This allowed us to perform an apples-to-apples, end-to-end latency comparison.
Under our production workloads, we saw that PgCat’s latency profile closely tracked PgBouncer’s. Below we show the end-to-end query latency percentiles on both proxies. The sample size for the data below is ~1.5M production queries performed over the span of 12 hours.
+------+-----------+---------+------------+