I started my programming career as an Oracle DBA. It took a few years but eventually I got fed up with the corporate world and I went about doing my own thing.
When I no longer had the comfy cushion of Oracle enterprise edition I discovered PostgreSQL. After I gotten over not having proper partitions and MERGE statement (aka UPSERT), I found some nice unique features in PostgreSQL. Oddly enough, a lot of them contained the word DISTINCT.
DISTINCT
I created a simple Employee table with name, department and salary using mock data from this site:
haki=# d employee Column | Type | Modifiers ------------+-----------------------+----------- id | integer | not null name | character varying(30) | department | character varying(30) | salary | integer | haki=# select * from employee limit 5; id | name | department | salary ----+----------------+----------------------+-------- 1 | Carl Frazier | Engineering | 3052 2 | Richard Fox | Product Management | 13449 3 | Carolyn Carter | Engineering | 8366 4 | Benjamin Brown | Business Development | 7386 5 | Diana Fisher | Services | 10419
What is DISTINCT?
SELECT DISTINCT eliminates duplicate rows from the result.
The simplest use of distinct is, for example, to get a unique list of
departments:
haki=# SELECT DISTINCT department FROM employee; department -------------------------- Services Support Training Accounting Business Development Marketing Product Management Human Resources Engineering Sales Research and Development Legal
(easy CS students, I know it’s not normalized…)
We can do the same thing with group by
SELECT department FROM employee GROUP BY department;
But we are talking about DISTINCT.
DISTINCT ON
A classic job interview question is finding the employee with the highest salary in each department.
This is what they teach in the university:
haki=# SELECT * FROM employee WHERE (department, salary) IN ( SELECT department, MAX(salary) FROM employee GROUP BY department ) ORDER BY department; id | name | department | salary ----+------------------+--------------------------+-------- 30 | Sara Roberts | Accounting | 13845 4 | Benjamin Brown | Business Development | 7386 3 | Carolyn Carter | Engineering | 8366 20 | Janet Hall | Human Resources | 2826 14 | Chris Phillips | Legal | 3706 10 | James Cunningham | Legal | 3706 11 | Richard Bradley | Marketing | 11272 2 | Richard Fox | Product Management | 13449 25 | Evelyn Rodriguez | Research and Development | 10628 17 | Benjamin Carter | Sales | 6197 24 | Jessica Elliott | Services | 14542 7 | Bonnie Robertson | Support | 12674 8 | Jean Bailey | Training | 13230
Legal has two employees with the same high salary. Depending on the use case, this query can get pretty nasty.
If you graduated a while back, you already know a few things about databases and you heard about analytic and window functions, you might do this:
WITH ranked_employees AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY department ORDER BY salary DESC ) AS rn, * FROM employee ) SELECT * FROM ranked_employees WHERE rn = 1 ORDER BY department;
The result is the same without the duplicates:
rn | id | name | department | salary ----+----+------------------+--------------------------+-------- 1 | 30 | Sara Roberts | Accounting | 13845 1 | 4 | Benjamin Brown | Business Development | 7386 1 | 3 | Carolyn Carter | Engineering | 8366 1 | 20 | Janet Hall | Human Resources | 2826 1 | 14 | Chris Phillips | Legal | 3706 1 | 11 | Richard Bradley | Marketing | 11272 ...
Up until now, this is what I would have done.
Now for the real treat, PostgreSQL has a special nonstandard clause to find the first row in a group:
SELECT DISTINCT ON (department) * FROM employee ORDER BY department, salary DESC;

This is wild! Why nobody told me this is possible?