• English
  • Русский
  • 日本語
  • Photo

    Novikov "Envek" Andrey

    Software engineer. Japan, Osaka.

  • GitHub
  • LinkedIn
  • SQL window functions

    Attention: Queries in this article may work only in PostgreSQL, check docs for your RDBMS!

    Some time ago I have discovered window functions for myself. I will talk about them on example, when they helped me: database record renumbering.

    That time I have been adding a multitenancy in our application — users in it should work in scope of different stations. All the entities in one station should not affect any entities on another. Adding foreign keys to tables easy. But what about existing data?

    So, in our app there is a “Waybill” model. This is information about document that reflects a vehicle route and job. Waybills are numbered and at the time ID was used as a number. But afterwards waybill numbering should be unique in scope of station and document series (that was added at some point of time).

    First, linking waybills directly to stations. Looking at the data diagram I see: waybills belongs to vehicles, vehicles belongs to stations. So, in migration we will add foreign key column station_id and write SQL-query that will copy station identifiers from vehicles to waybills table:

    UPDATE waybills
    SET station_id = vehicles.station_id
    FROM vehicles
    WHERE vehicles.id = waybills.vehicle_id
    

    Then we will add column called number and now we need to fill it with ordered numbers for each station and series. Window functions to the rescue!

    UPDATE waybills
    SET number = w.number
    FROM (
      SELECT
        id,
        row_number() OVER (
          PARTITION BY station_id, series ORDER BY created_at
        ) AS number
      FROM waybills
    ) w
    WHERE waybills.id = w.id;
    

    Well, what is going on there?

    In outer query we are using PostgreSQL-specific UPDATE … FROM syntax. We’re filling number column with value returned from subquery for row with the same id.

    In subquery we’re taking id of every row in our waybills table and calculating sequence number in each partition with row_number() window function.

    Take a look to the window functions call syntax: function OVER (partition). Partition is defined with PARTITION BY keyword with expressions list (we just use column names) or there can be no partition at all. Rows will be partitioned in groups by the all unique values of PARTITION BY expressions. Unlike GROUP BY rows in partition would not be merged in one row, but will be processed in sequence. ORDER BY keyword allows to reorder rows inside every partition.

    So, we have partitioned all rows in groups for every station and every series in station and sorted rows from oldest to newest inside every group (function row_number() will return 1 for oldest row in group).

    Let’s try subquery on test data:

    =# SELECT id, station_id, series, created_at, row_number() OVER (PARTITION BY station_id, series ORDER BY created_at) AS number FROM waybills;
     id |              station_id              | series |         created_at         | number
    ----+--------------------------------------+--------+----------------------------+--------
      5 | 258d39b6-7b09-49ad-94f6-33b0dcb2fa32 |        | 2015-04-07 10:53:44.736078 |      1
      1 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 123    | 2015-03-16 09:55:16.689384 |      1
      6 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 123    | 2015-04-28 13:47:16.397076 |      2
      7 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 123    | 2015-04-28 13:47:40.23337  |      3
      2 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 23     | 2015-03-18 12:06:25.688768 |      1
      4 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 4606   | 2015-03-24 08:10:38.143429 |      1
      3 | 69c5b783-034c-47ea-84fd-c4abd0a323d6 | 656565 | 2015-03-18 15:30:10.709491 |      1
    (7 rows)
    

    Awesome!

    Complete migration will look like this:

    class AddMultistationSupportForWaybills < ActiveRecord::Migration
      def change
        change_table :waybills do |t|
          t.integer :number
          t.references :station, type: :uuid
        end
        add_foreign_key :waybills, :stations
        
        reversible do |to|
          to.up do
            execute <<-PostgreSQL.strip_heredoc.tr("\n", ' ')
              UPDATE waybills
              SET station_id = vehicles.station_id
              FROM vehicles
              WHERE vehicles.id = waybills.vehicle_id
            PostgreSQL
            execute <<-PostgreSQL.strip_heredoc.tr("\n", ' ')
              UPDATE waybills
              SET number = w.number
              FROM (
                SELECT row_number() OVER (PARTITION BY station_id, series ORDER BY created_at) AS number, id FROM waybills
              ) w
              WHERE waybills.id = w.id;
            PostgreSQL
          end
        end
      end
    end
    

    Conclusion

    SQL is very powerful and provides tools for almost every case. Remember all abilities is practically impossible. Hope that this article will help people to learn a bit more about excellent tools.

    Learn more