SQL Grouping sets, Rollups & Cube

  ·   5 min read

Let’s motivate grouping sets with an example. Both the dataset and prompt are sourced from pgexercises - an excellent resource for learning and practicing Postgres-flavoured SQL. I’ll also be using DuckDB along the way (it too is Postgres-flavoured).

We’ve got a country club which has a couple of facilities such as tennis courts and pool tables. We’re using a database to keep track of members and bookings. Members (and guests, at a slightly higher rate) can book facilities. The duration for a given booking is kept track of using a starttime and the number of half-hour slots.

We want to know:

  • For each facility:
    • the number of slots booked for the entire month
    • the number of slots booked for the entire year (2012)
  • The total number of slots booked across the entire country club for that year

In other words, the result should have the following shape:

FacilityMonthSlots
Tennis Court 2August483
September588
1071
Badminton CourtAugust459
September579
1029
Total2100

As you can observe, we are aggregating across multiple hierarchies or ‘zoom levels’.

The most straightforward approach then is to carry out the aggregations separately then union them all together:

select
    facility,
    month,
    sum(slots) as slots
from bookings_2012
group by facility, month

union all

select
    facility,
    null,
    sum(slots) as slots
from bookings_2012
group by facility

union all

select
    null,
    null,
    sum(slots) as slots
from bookings_2012
group by facility


order by facility, month

Note that we use union all instead of union since we don’t expect (or mind) any duplicate rows.

This query could be made less verbose by using a CTE - again, credits to pgexercises:

with t as (
    select
        facility,
        month,
        slots
    from bookings_2012
)
select facilty,  month, sum(slots) from t group by facility, month
  union all
select facility, null,  sum(slots) from t group by facility
  union all
select null,     null,  sum(slots) from t
order by facility, month;

Rollup #

Better yet, the above query could be simplified by using rollup:

select
    facility,
    month,
    sum(slots)
from bookings_2012
group by rollup(facilty, month)
order by facility, month

With all three versions, we get the same result:

┌─────────────────┬───────────┬────────┐
│    facility     │   month   │ slots  │
│     varchar     │  varchar  │ int128 │
├─────────────────┼───────────┼────────┤
│ Tennis Court 2  │ August    │    483 │
│ Tennis Court 2  │ September │    588 │
│ Tennis Court 2  │           │   1071 │
│ Badminton Court │ August    │    459 │
│ Badminton Court │ September │    570 │
│ Badminton Court │           │   1029 │
│                 │           │   2100 │
└─────────────────┴───────────┴────────┘

Rollup is used to aggregate across levels in the order they are passed. Therefore, with facility, month we get the levels in the following order:

  • (facility, month): sum the slots for each facility, month pair
  • (facility): sum the slots for each facility
  • (): sum all slots

If we switch the order i.e. to (month, facility), we get a result with the following shape:

FacilityMonthSlots
AugustTennis Court 2483
Badminton Court459
942
SeptemberTennis Court 2588
Badminton Court570
1158
Total2100

Grouping set #

Rollups are in fact syntactic sugar for the more generic grouping sets clause. Therefore, group by rollup(facility, month) expands to:

group by grouping sets(
  (facility, month),
  (facility),
  (),
)

Grouping sets were added to SQL since with the plain group by, you can only specify a single grouping to aggregate across.

However, as the previous example demonstrates you may need to aggregate across multiple kinds of groups in parallel - hence grouping sets in SQL.

Grouping sets give as more flexibility. For example, if we don’t need the year’s sum we can discard it:

select
    facility,
    month,
    sum(slots) as slots
from bookings_2012
group by grouping sets ((facility, month), (facility))

The result is:

┌─────────────────┬───────────┬────────┐
│    facility     │   month   │ slots  │
│     varchar     │  varchar  │ int128 │
├─────────────────┼───────────┼────────┤
│ Tennis Court 2  │ August    │    483 │
│ Tennis Court 2  │ September │    588 │
│ Tennis Court 2  │           │   1071 │
│ Badminton Court │ August    │    459 │
│ Badminton Court │ September │    570 │
│ Badminton Court │           │   1029 │
└─────────────────┴───────────┴────────┘

Cube #

There’s also cube which we can use to aggregate across all possible permutations of the given columns and expressions (i.e. the powerset). For example, from the Postgres docs, group by cube(a,b,c) is equivalent to:

grouping sets (
    ( a, b, c ),
    ( a, b    ),
    ( a,    c ),
    ( a       ),
    (    b, c ),
    (    b    ),
    (       c ),
    (         )
)

Grouping aggregate function #

As an aside, consider the following case: if a column within that grouping set has a null, it cannot be distinguished from the columns outside the group (since those are also filled with nulls). Therefore, both Postgres and DuckDB provide the grouping function. In duckdb, it also has the alias grouping_id. This function takes in a group and returns 0 if that row is in the given group or 1 otherwise.

For example, given the following query:

select
    facility,
    month,
    grouping_id(month) as in_group_month,
    sum(slots)
from bookings_2012
group by grouping sets (facility, month)
order by facility, month

We get:

┌─────────────────┬───────────┬────────────────┬────────────┐
│    facility     │   month   │ in_group_month │ sum(slots) │
│     varchar     │  varchar  │     int64      │   int128   │
├─────────────────┼───────────┼────────────────┼────────────┤
│ Badminton Court │           │              1 │       1209 │
│ Tennis Court 2  │           │              1 │       1278 │
│                 │ August    │              0 │        942 │
│                 │ July      │              0 │        387 │
│                 │ September │              0 │       1158 │
└─────────────────┴───────────┴────────────────┴────────────┘

In summary #

Do check out the Postgres documentation for more details on grouping sets. It’s quite a neat addition to SQL that saves us from both verbose queries and having to merge results client-side for certain kinds of reporting.

References #

  1. PostgreSQL Exercises - Alisdair Owens: pgexercises
  2. PostgreSQL Documentation - GROUPING SETS, CUBE and ROLLUP: docs
  3. DuckDB Documentation - GROUPING SETS: link
  4. PostgreSQL GROUPING SETS: Postgres Tutorial
  5. The Art of Postgres - Grouping Sets - Dimitri Fontaine