SQL window functions
Attention: Queries in this article may work only in PostgreSQL, check docs for your RDBMS!
Some time ago I have discovered window functions for myself. I will talk about them on example, when they helped me: database record renumbering.
That time I have been adding a multitenancy in our application — users in it should work in scope of different stations. All the entities in one station should not affect any entities on another. Adding foreign keys to tables easy. But what about existing data?
So, in our app there is a “Waybill” model. This is information about document that reflects a vehicle route and job. Waybills are numbered and at the time ID was used as a number. But afterwards waybill numbering should be unique in scope of station and document series (that was added at some point of time).
First, linking waybills directly to stations. Looking at the data diagram I see: waybills belongs to vehicles, vehicles belongs to stations. So, in migration we will add foreign key column station_id
and write SQL-query that will copy station identifiers from vehicles
to waybills
table:
UPDATE waybills
SET station_id = vehicles.station_id
FROM vehicles
WHERE vehicles.id = waybills.vehicle_id
Then we will add column called number
and now we need to fill it with ordered numbers for each station and series. Window functions to the rescue!
UPDATE waybills
SET number = w.number
FROM (
SELECT
id,
row_number() OVER (
PARTITION BY station_id, series ORDER BY created_at
) AS number
FROM waybills
) w
WHERE waybills.id = w.id;
Well, what is going on there?
In outer query we are using PostgreSQL-specific UPDATE … FROM
syntax. We’re filling number
column with value returned from subquery for row with the same id
.
In subquery we’re taking id
of every row in our waybills
table and calculating sequence number in each partition with row_number()
window function.
Take a look to the window functions call syntax: function OVER (partition)
. Partition is defined with PARTITION BY
keyword with expressions list (we just use column names) or there can be no partition at all. Rows will be partitioned in groups by the all unique values of PARTITION BY
expressions. Unlike GROUP BY
rows in partition would not be merged in one row, but will be processed in sequence. ORDER BY
keyword allows to reorder rows inside every partition.
So, we have partitioned all rows in groups for every station and every series in station and sorted rows from oldest to newest inside every group (function row_number()
will return 1
for oldest row in group).
Let’s try subquery on test data:
=# SELECT id, station_id, series, created_at, row_number() OVER (PARTITION BY station_id, series ORDER BY created_at) AS number FROM waybills;
id | station_id | series | created_at | number
----+--------------------------------------+--------+----------------------------+--------
5 | 258d39b6-7b09-49ad-94f6-33b0dcb2fa32 | | 2015-04-07 10:53:44.736078 | 1
1 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 123 | 2015-03-16 09:55:16.689384 | 1
6 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 123 | 2015-04-28 13:47:16.397076 | 2
7 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 123 | 2015-04-28 13:47:40.23337 | 3
2 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 23 | 2015-03-18 12:06:25.688768 | 1
4 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 4606 | 2015-03-24 08:10:38.143429 | 1
3 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 656565 | 2015-03-18 15:30:10.709491 | 1
(7 rows)
Awesome!
Complete migration will look like this:
class AddMultistationSupportForWaybills < ActiveRecord::Migration
def change
change_table :waybills do |t|
t.integer :number
t.references :station, type: :uuid
end
add_foreign_key :waybills, :stations
reversible do |to|
to.up do
execute <<-PostgreSQL.strip_heredoc.tr("\n", ' ')
UPDATE waybills
SET station_id = vehicles.station_id
FROM vehicles
WHERE vehicles.id = waybills.vehicle_id
PostgreSQL
execute <<-PostgreSQL.strip_heredoc.tr("\n", ' ')
UPDATE waybills
SET number = w.number
FROM (
SELECT row_number() OVER (PARTITION BY station_id, series ORDER BY created_at) AS number, id FROM waybills
) w
WHERE waybills.id = w.id;
PostgreSQL
end
end
end
end
Conclusion
SQL is very powerful and provides tools for almost every case. Remember all abilities is practically impossible. Hope that this article will help people to learn a bit more about excellent tools.