TLDR:
We spent a few months optimizing PostgreSQL and made it to the Top 10 on ClickBench,
a benchmark typically dominated by specialized analytics databases.
What’s more, all compute is within Postgres, and all tables are managed directly by PostgreSQL—it’s not a simple wrapper. This is the story of pg_mooncake.
What Is ClickBench?
ClickBench is the definitive benchmark for real-time analytics databases, originally designed to showcase the performance of ClickHouse. It evaluates databases on their ability to handle real-world analytics workloads, including high-volume table scans and complex aggregations.
Historically, ClickHouse and other purpose-built analytics databases have led this benchmark, while general-purpose databases like Postgres/MySQL have lagged behind by 100x. But we wanted to challenge that perception—and Postgres delivered.
How to Build Analytics in Postgres?
When most people think of PostgreSQL, they think of a rock-solid OLTP database, not a real-time analytics powerhouse. However, PostgreSQL’s extensibility makes it uniquely capable of punching above its weight class.
Here’s how we approached the challenge:
1. Build a PG Extension
We leveraged PG’s extensibility to build pg_mooncake as a native PG extension.
2. Storage Format: Columnstore
For analytics workloads, a columnstore format is essential. ClickBench workloads typically involve wide tables, but queries only access a small subset of columns.
- In a row store (like PostgreSQL heap table), reading a single column means jumping through rows.
- In a columnstore, reads are sequential, which is faster (and it also enables better compression and execution on compressed data).
3. Vectorized Execution
To enhance query execution speed, we embedded DuckDB as the execution engine for columnstore queries. This means across the execution pipeline, data is processed in batches instead of row by row, enabling SIMD, which is a lot more efficient for scans, groupbys, and aggregations.
4. Table Metadata & Management Directly in PostgreSQL
Efficient metadata handling is critical for r
11 Comments
nikita
This is an exciting project. Few highlights:
– Query processor is DuckDB – as long as it translates PG type system to DuckDB typesystem well – it will be very fast.
– Data is stored on S3 in Parquet with Delta or Iceberg metadata. This is really cool. You don't need to push analytical data through WAL – only metadata goes into WAL. This mean fast loading at least in theory, and compatibility with all the Delta/Iceberg ecosystem.
– Once they build real-time ingest, you can just push timeseries into this system and you don't need a second system like Clickhouse
pier25
They're already working on 0.2
https://www.mooncake.dev/blog/pgmooncake-v02discussion
dleeftink
> To enhance query execution speed, we embedded DuckDB as the execution engine for columnstore queries
So is it Postgres or DuckDB that cracked the analytics top ?
tarun_anand
Will this work alongside Citus?
If yes, what benefits/disadvantages would it have? I presume at some point of time workloads do need a cluster.
bigtones
Looks like they're currently No. 12 on their cited Clickbench benchmark – so not quite in the top 10 three weeks later.
antonmks
It is not really Postgres, the queries are run on DuckDB. Yeah, DuckDB is really fast for analytical queries.
jot
How is this different from Crunchy Warehouse which is also built on Postgres and DuckDB?
https://www.crunchydata.com/products/warehouse
rubenvanwyk
Looking at the effort being put in it seems there’s a massive market for proper Postgres analytics / columnstore, which is very telling of the pain involved in moving data between systems.
dcreater
All you need is postgres: part 73
saisrirampur
Sai from PeerDB/ClickHouse here. Nice to see the progress on this project! I wanted to leave a few notes:
For analytics on transactional data, it looks like you'd still need to use logical replication (https://github.com/Mooncake-Labs/pg_mooncake/issues/90). Logical replication is somewhat similar to an ETL/CDC experience, though it's more Postgres-native. Managing logical replication at a production grade isn't trivial — it's quite common for customers to use PeerDB for homogeneous replication due to performance, manageability and observability issues with logical replication.
One potential value of extensions is Postgres compatibility for your queries. However, I'm curious about the SQL coverage on columnar tables — for example, advanced constructs like partitioned tables, prepared statements, correlated subqueries, RCTEs, triggers, and more. While it seems there’s ongoing work to improve compatibility, achieving full Postgres compatibility is challenging. In some cases, you might need to fall back to Postgres (instead of DuckDB), sacrificing performance for certain queries.
The biggest challenge we faced at Citus was the constant trade-off between Postgres compatibility and performance — both are complex and constantly moving targets. This was the key reason why Citus couldn't compete with purpose-built databases like Snowflake, ClickHouse, SingleStore. While DuckDB didn’t exist when we built Citus, it's still fundamentally different from Postgres. Even though the DuckDB dialect is somewhat similar to Postgres (as is Snowflake’s), retrofitting two distinct databases — each evolving independently for world's best performance — isn't trivial.
In short, relying on ETL (logical replication) without providing full Postgres compatibility raises the question: is it better to lean on a purpose-built database with a laser focus on performance, or adopt a more generalized approach?
Anyway, I come from a different school of thought — using the right tool for the right job. That said, I love seeing all the progress and evolution in the Postgres community — Postgres will always be my first love!
timsuchanek
Congrats on the launch!
Does it support ordinary db migrations with the normal Postgres interface?