SQL:2023 has been wrapped. The final text has been submitted by the
working group to ISO Central Secretariat, and it’s now up to the ISO
gods when it will be published. Based on past experience, it could be
between a few weeks and a few months.
In the meantime, we can look at what is new. The changes can be
grouped into three areas:
- Various smaller changes to the existing SQL language
- New features related to JSON
- A new part for property graph queries
Let’s look at each one.
All new functionality in the SQL standard is in the form of optional
features, so I’m giving the feature codes and names below for
reference. (But this is not an exhaustive list of all new features
codes. I’m omitting some that are mainly technical changes or for
compatibility.)
Various changes
UNIQUE null treatment (F292)
This feature deals with how null values are handled in unique
constraints. Consider the following:
CREATE TABLE t1 (
a int,
b int,
c int,
UNIQUE (a, b, c)
);
and
INSERT INTO t1 VALUES (1, NULL, NULL);
INSERT INTO t1 VALUES (1, NULL, NULL); -- ?
The question is whether the second inserted row should cause a unique
constraint violation.
Apparrently, the old text of the standard was ambiguous about this.
One section seemed to indicate one thing, another section another
thing. Different implementations have done different things.
To consolidate this, there is now an option to explicitly select a
behavior:
CREATE TABLE t2 (
a int,
b int,
c int,
UNIQUE NULLS DISTINCT (a, b, c)
);
and
INSERT INTO t2 VALUES (1, NULL, NULL);
INSERT INTO t2 VALUES (1, NULL, NULL); -- ok
versus
CREATE TABLE t3 (
a int,
b int,
c int,
UNIQUE NULLS NOT DISTINCT (a, b, c)
);
and
INSERT INTO t3 VALUES (1, NULL, NULL);
INSERT INTO t3 VALUES (1, NULL, NULL); -- error
The use of words here means: If nulls are considered distinct, then
having more than one of them won’t cause a unique constraint
violation. If nulls are considered not distinct, then having more
than one violates uniqueness.
The default for this option is implementation-defined, so existing
implementations can keep their current behavior. But at least that’s
explicit now.
ORDER BY in grouped table (F868)
Consider the following tables and query:
CREATE TABLE product (
product_id int PRIMARY KEY,
product_name varchar,
product_code varchar UNIQUE
);
CREATE TABLE product_part (
product_id int,
part_id int,
num int,
PRIMARY KEY (product_id, part_id)
);
SELECT product.product_id, sum(product_part.num)
FROM product JOIN product_part ON product.product_id = product_part.product_id
GROUP BY product.product_id
ORDER BY product.product_code;
This probably works just fine in most SQL implementations beyond the
most simple ones, but it turned out that this was technically not
allowed. In particular, it was not allowed to have a grouped table
ordered by a column that is not exposed by the SELECT
list of the
grouped table. Now, an SQL implementation can claim to support this
feature if it does allow this.
Again, this is probably not something you need to pay attention to in
practice, but this is the sort of thing that sometimes gets fixed.
GREATEST and LEAST (T054)
This adds two new functions GREATEST
and LEAST
. These have
already been present in many implementation.
SELECT greatest(1, 2); --> 2
SELECT least(1, 2); --> 1
Any number of arguments are supported:
SELECT greatest(1, 2, 3); --> 3
SELECT least(1, 2, 3); --> 1
Obviously, this is more interesting with variable data. You can do
“whichever is more” or “whichever is less” calculations like this:
SELECT greatest(base_price * 0.10, fixed_fee) FROM data ...
SELECT least(standard, discount) FROM data ...
To be clear, in most programming languages, these functions would
merely be called max
and min
. But in SQL, those names are already
used for aggregate functions. Since there is no syntactic difference
between normal functions and aggregate functions, you have to pick two
different names.
String padding functions (T055)
This adds two new string functions LPAD
and RPAD
. These are also
already in many implementations.
SELECT lpad(cast(amount as varchar), 12, '*') FROM ...
might result in something like
The padding character is space by default.
Multi-character TRIM functions (T056)
Another set of functions already known from existing implementations:
LTRIM
, RTRIM
, and BTRIM
.
Unlike the existing single-character trim function
(TRIM({LEADING|TRAILING|BOTH} 'x' FROM val)
), which can only trim a
single character, these can trim multiple characters. They also have
a less obscure syntax, so they might be easier to use in general.
Here is an example that is effectively the inverse of the lpad
call
above:
SELECT ltrim(val, '*') ...
Optional string types maximum length (T081)
This allows leaving off the maximum length specification on the
VARCHAR
/CHARACTER VARYING
type. Before, standard SQL required the
length to be specified, so you often see examples with seemingly
arbitrary lengths like
CREATE TABLE t1 (
a VARCHAR(1000),
b VARCHAR(4000),
...
);
Now you can just leave it off:
CREATE TABLE t1 (
a VARCHAR,
b VARCHAR,
...
);
In that case, an implementation-defined default limit will be applied.
Some implementations already had a way to achieve this, but now there
is a standard way to do it.
Enhanced cycle mark values (T133)
The CYCLE
clause is a lesser-known feature of recursive queries. To
detect a cycle in such a query, you can write
WITH RECURSIVE ... (
SELECT ...
UNION ALL
SELECT ...
)
CYCLE id SET is_cycle TO 'Y' DEFAULT 'N' USING path;
This would track cycles based on the id
column and set the
is_cycle
column to the specified values if a cycle had been detected
or not.
When recursive queries were added to SQL, there was no boolean
type,
so the old standard required you to use a character string, like shown
here. (The actual values shown are not required but typical.) By
now, there is a boolean
type, so this can be modernized a bit. The
new feature in SQL:2023 is t