DISTINCT ON
in PostgresEvery once in a while, I'll have a need to do a one-to-many join, but keep only a certain row in the "many" table. For instance, say we have a system to track inventory in retail stores:
CREATE TABLE store (
id SERIAL PRIMARY KEY,
name TEXT
);
CREATE TABLE inventory (
store_id INTEGER REFERENCES store,
quantity INTEGER,
item_name TEXT,
PRIMARY KEY (store_id, item_name)
);
INSERT INTO store(name) VALUES
('School Supplies R Us'),
('Grocery Mart');
INSERT INTO inventory VALUES
(1, 1, 'Backpack'),
(1, 12, 'Pencil'),
(1, 4, 'Pen'),
(2, 12, 'Egg'),
(2, 1, 'Flour (lb.)');
We can get the inventory for all stores easily enough.
SELECT name, quantity, item_name
FROM inventory
JOIN store ON inventory.store_id = store.id;
name | quantity | item_name
----------------------+----------+-------------
School Supplies R Us | 1 | Backpack
School Supplies R Us | 12 | Pencil
School Supplies R Us | 4 | Pen
Grocery Mart | 12 | Egg
Grocery Mart | 1 | Flour (lb.)
But what if we only want to get the item with highest quantity from each store? Fortunately, Postgres has a syntax that makes this easy.
SELECT DISTINCT ON(store_id) name, quantity, item_name
FROM inventory
JOIN store ON inventory.store_id = store.id
ORDER BY store_id, quantity DESC;
name | quantity | item_name
----------------------+----------+-----------
School Supplies R Us | 12 | Pencil
Grocery Mart | 12 | Egg
What does DISTINCT ON
do? Well, it selects the first row out of the set of rows whose values match for the given
columns. The first row is arbitrary unless we pass along an ORDER BY
statement. Note that we have to include the
columns from the ON()
clause in our ORDER BY
. If we don't, we get a helpful error message:
SELECT DISTINCT ON (store_id) name, quantity, item_name
FROM inventory
JOIN store ON inventory.store_id = store.id
ORDER BY quantity DESC;
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON (store_id) name, quantity, item_name
^
If you run into a situation wherein you need to choose a specific row in a group based on some rules, try using
DISTINCT ON
. For more detail, check out the Postgres documentation.