Earlier this year, I was in Canada for PGConf 2023, where I talked about the evolution of PostgreSQL from a Berkeley research project to its current status as the most advanced open-source database, and discussed the various changes introduced for PostgreSQL 16, particularly regarding logical replication.
Now that the PostgreSQL community has released version 16, I thought it would be the perfect opportunity to talk to a wider audience, especially if you did not have the chance to join us in Canada.
In my talk, I focused on the exciting new features introduced in version 16, but I also touched on the past of PostgreSQL – the timeline of its major features in previous versions -, and its future – what the community has been discussing, possibly for implementation in PostgreSQL 17.
The evolution of PostgreSQL – versions and major milestones
To better appreciate the long journey that PostgreSQL has taken since its early days, I think it’s important to revisit its progress, and the impressive list of features it accumulated along the way, thanks to the work of our committed and dedicated community.
Evolution of PostgreSQL with its major milestones across versions
We have started this project in 1997 from the University of California, at Berkeley Project, which had been running since 1986, and from then on, a new version with major features has been released every year.
It is interesting to note how, from inception, PostgreSQL has been geared towards handling large volumes of data, and has been evolving since then in that direction.
For example, version 9.0 brought streaming replication, which helps with failover, where one node can take over in case another goes down. Version 9.1 introduced foreign tables and unlogged tables to its list features, alongside synchronous replication, helping make it more reliable and further enhance its value.
With 9.2, we have added support for JSON data type, where document and its related data could be stored using this data type. This version also brought index-only scans, which improved the speed of various kinds of queries. The next year, version 9.3 added updates to foreign data wrappers and materialized views, and then 9.4 introduced JSONB data type, an improved version of JSON that allows faster processing and indexes on that data type.
Version 9.5 added row-level security and BRIN indexes, which can dramatically speed up queries of certain types. With 9.6, parallelism came into Postgres, with the addition of parallel sequential scans and parallel JOINs, which can be leveraged by OLAP applications. And to further enhance our reliability, we introduced multiple standby servers in synchronous replication.
In version 10, a new landscape has opened for Postgres, with the introduction of logical replication and declarative partitioning. Version 11 introduced partitioning by hash key and SQL stored procedures, widening the appeal for organizations considering migration from other databases. With version 12, we enhanced performance of partitioning and introduced table access methods, where people can write their specialized storage engines, which they can integrate with PostgreSQL. Some of the key features of version 13 are de-duplication in the B-tree index, incremental sorting, and parallel vacuum for indexes.
Version 14 introduced a large number of important features and enhancements. We achieved better read scalability by improving our snapshot mechanism, and allowed logical replication for in-progress transactions, which reduce lag when applying large transactions. We also reduced bloat for B-tree index updates, and allowed parallel foreign table scan using postgres_fdw.
With version 15, we introduced the MERGE command, which was being discussed for a couple of years in the community, and we finally had the chance to implement it. We also introduced shared memory statistics, which improved over the previous statistics mechanism, and improved logical replication further, by introducing row and column filters. Finally, we added server-side compression for faster and smaller backups.
PostgreSQL 16 enhancements and new features
Version 16 introduces a lot of new features, with several improvements to the logical replication mechanism. One of the most important ones in my view is the ability to perform logical replication from the standby node, which I will discuss below.
Improvements to logical replication
- Data can be filtered based on origin during replication.
CREATE PUBLICATION mypub FOR ALL TABLES;
CREATE SUBSCRIPTION mysub CONNECTION ‘dbname=postgres’
PUBLICATION mypub WITH (origin = none);Prior to Postgres 16, setting a bi-directional or logical replication among nodes was difficult, because if we set up replication for a table, it would lead to an infinite loop. By adding the ability to filter data based on origin, we can set up n-way logical replication, and that will prevent loops when performing bi-directional replication.
- Logical decoding can be performed from the standby server.
This requires wal_level = logical on both primary and standby.
This ability can be used for workload distribution, by allowing subscribers to subscribe from the standby when the primary is busy.
- The apply process can be configured to perform operations with the table owner’s privileges instead of subscription owner’s privileges.
CREATE SUBSCRIPTION mysub CONNECTION …
PUBLICATION mypub WITH (run_as_owner = false); - Non-superusers can create subscriptions.
The non-superusers must have been granted pg_create_subscription role, and are required to specify a password for authentication.
Superusers can set password_required = false for non-superusers that own the subscription.
- Large transactions can be applied in parallel.
CREATE SUBSCRIPTION mysub CONNECTION …
PUBLICATION mypub WITH (streaming = parallel);Performance improvement in the range of 25-40% has been observed (for further details, check here).
Each large transaction is assigned to one of the available workers, which improves lag by immediately applying instead of waiting till whole transaction is received by the subscriber. The worker remains assigned until the transaction completes.
max_parallel_apply_workers_per_subscription sets the maximum number of parallel apply workers per subscription.
- Logical replication can copy tables in binary format.
CREATE SUBSCRIPTION mysub CONNECTION …
PUBLICATION mypub WITH (binary = true);Copying ta