Postgres Just Cracked the Top Fastest Databases for Analytics by moonikakiss

Share This Article

Sed ut perspiciatis unde.

TLDR:
We spent a few months optimizing PostgreSQL and made it to the Top 10 on ClickBench,
a benchmark typically dominated by specialized analytics databases.

What’s more, all compute is within Postgres, and all tables are managed directly by PostgreSQL—it’s not a simple wrapper. This is the story of pg_mooncake.

What Is ClickBench?

ClickBench is the definitive benchmark for real-time analytics databases, originally designed to showcase the performance of ClickHouse. It evaluates databases on their ability to handle real-world analytics workloads, including high-volume table scans and complex aggregations.

Historically, ClickHouse and other purpose-built analytics databases have led this benchmark, while general-purpose databases like Postgres/MySQL have lagged behind by 100x. But we wanted to challenge that perception—and Postgres delivered.

How to Build Analytics in Postgres?

When most people think of PostgreSQL, they think of a rock-solid OLTP database, not a real-time analytics powerhouse. However, PostgreSQL’s extensibility makes it uniquely capable of punching above its weight class.
Here’s how we approached the challenge:

1. Build a PG Extension

We leveraged PG’s extensibility to build pg_mooncake as a native PG extension.

2. Storage Format: Columnstore

For analytics workloads, a columnstore format is essential. ClickBench workloads typically involve wide tables, but queries only access a small subset of columns.

In a row store (like PostgreSQL heap table), reading a single column means jumping through rows.
In a columnstore, reads are sequential, which is faster (and it also enables better compression and execution on compressed data).

3. Vectorized Execution

To enhance query execution speed, we embedded DuckDB as the execution engine for columnstore queries. This means across the execution pipeline, data is processed in batches instead of row by row, enabling SIMD, which is a lot more efficient for scans, groupbys, and aggregations.

4. Table Metadata & Management Directly in PostgreSQL

Efficient metadata handling is critical for r

Post Author

nikita

Posted March 8, 2025 at 5:42 am

This is an exciting project. Few highlights:
– Query processor is DuckDB – as long as it translates PG type system to DuckDB typesystem well – it will be very fast.
– Data is stored on S3 in Parquet with Delta or Iceberg metadata. This is really cool. You don't need to push analytical data through WAL – only metadata goes into WAL. This mean fast loading at least in theory, and compatibility with all the Delta/Iceberg ecosystem.
– Once they build real-time ingest, you can just push timeseries into this system and you don't need a second system like Clickhouse

0Likes Log in to Reply
Post Author

pier25

Posted March 8, 2025 at 5:47 am

They're already working on 0.2

https://www.mooncake.dev/blog/pgmooncake-v02discussion

0Likes Log in to Reply
Post Author

dleeftink

Posted March 8, 2025 at 5:47 am

> To enhance query execution speed, we embedded DuckDB as the execution engine for columnstore queries

So is it Postgres or DuckDB that cracked the analytics top ?

0Likes Log in to Reply
Post Author

tarun_anand

Posted March 8, 2025 at 5:49 am

Will this work alongside Citus?
If yes, what benefits/disadvantages would it have? I presume at some point of time workloads do need a cluster.

0Likes Log in to Reply
Post Author

bigtones

Posted March 8, 2025 at 6:29 am

Looks like they're currently No. 12 on their cited Clickbench benchmark – so not quite in the top 10 three weeks later.

0Likes Log in to Reply
Post Author

antonmks

Posted March 8, 2025 at 6:39 am

It is not really Postgres, the queries are run on DuckDB. Yeah, DuckDB is really fast for analytical queries.

0Likes Log in to Reply
Post Author

jot

Posted March 8, 2025 at 6:50 am

How is this different from Crunchy Warehouse which is also built on Postgres and DuckDB?

https://www.crunchydata.com/products/warehouse

0Likes Log in to Reply
Post Author

rubenvanwyk

Posted March 8, 2025 at 7:12 am

Looking at the effort being put in it seems there’s a massive market for proper Postgres analytics / columnstore, which is very telling of the pain involved in moving data between systems.

0Likes Log in to Reply
Post Author

dcreater

Posted March 8, 2025 at 7:18 am

All you need is postgres: part 73

0Likes Log in to Reply
Post Author

saisrirampur

Posted March 8, 2025 at 7:20 am

Sai from PeerDB/ClickHouse here. Nice to see the progress on this project! I wanted to leave a few notes:

For analytics on transactional data, it looks like you'd still need to use logical replication (https://github.com/Mooncake-Labs/pg_mooncake/issues/90). Logical replication is somewhat similar to an ETL/CDC experience, though it's more Postgres-native. Managing logical replication at a production grade isn't trivial — it's quite common for customers to use PeerDB for homogeneous replication due to performance, manageability and observability issues with logical replication.

One potential value of extensions is Postgres compatibility for your queries. However, I'm curious about the SQL coverage on columnar tables — for example, advanced constructs like partitioned tables, prepared statements, correlated subqueries, RCTEs, triggers, and more. While it seems there’s ongoing work to improve compatibility, achieving full Postgres compatibility is challenging. In some cases, you might need to fall back to Postgres (instead of DuckDB), sacrificing performance for certain queries.

The biggest challenge we faced at Citus was the constant trade-off between Postgres compatibility and performance — both are complex and constantly moving targets. This was the key reason why Citus couldn't compete with purpose-built databases like Snowflake, ClickHouse, SingleStore. While DuckDB didn’t exist when we built Citus, it's still fundamentally different from Postgres. Even though the DuckDB dialect is somewhat similar to Postgres (as is Snowflake’s), retrofitting two distinct databases — each evolving independently for world's best performance — isn't trivial.

In short, relying on ETL (logical replication) without providing full Postgres compatibility raises the question: is it better to lean on a purpose-built database with a laser focus on performance, or adopt a more generalized approach?

Anyway, I come from a different school of thought — using the right tool for the right job. That said, I love seeing all the progress and evolution in the Postgres community — Postgres will always be my first love!

0Likes Log in to Reply
Post Author

timsuchanek

Posted March 8, 2025 at 7:26 am

Congrats on the launch!
Does it support ordinary db migrations with the normal Postgres interface?

0Likes Log in to Reply

Postgres Just Cracked the Top Fastest Databases for Analytics by moonikakiss

Postgres Just Cracked the Top Fastest Databases for Analytics by moonikakiss

Share This Article

Newsletter

What Is ClickBench?

How to Build Analytics in Postgres?

HackTech

11 Comments

nikita

pier25

dleeftink

tarun_anand

bigtones

antonmks

jot

rubenvanwyk

dcreater

saisrirampur

timsuchanek

Leave a comment Cancel reply

Editor's Choice

Postgres Just Cracked the Top Fastest Databases for Analytics by moonikakiss

Postgres Just Cracked the Top Fastest Databases for Analytics by moonikakiss

Share This Article

Newsletter

What Is ClickBench?

How to Build Analytics in Postgres?

11 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter