I am modelling an application where events for factories can be registered. There are usually a small amount of factories (around 500), but each factory can register a large set of events (millions). It's impossible that two events overlap for the same factory.
My idea was to introduce an exclusion constraint based on a GIST index:
CREATE TABLE event
(
identifier SERIAL NOT NULL PRIMARY KEY
factory_identifier INTEGER NOT NULL REFERENCES factory,
period TSTZRANGE NOT NULL,
EXCLUDE USING gist
(
factory_identifier WITH =,
period WITH &&
)
);
In the documentation I read:
A GiST index will be relatively ineffective if its first column has only a few distinct values, even if there are many distinct values in additional columns.
I am not sure how to interpret this. Given that there are only a small amount of factories but a large number of events, should I define the exclusions constraint as following instead?
EXCLUDE USING gist
(
period WITH &&
factory_identifier WITH =,
)
Now, the period is the first column, for which there are a lot of distinct values (almost all values are unique, actually), in contrast to factory_identifier, for which there only a few distinct values.
Is this better now? In short, I have difficulties to grasp the quoted documentation above.
additional
In the slides on a presentation of exclusion constraints on page 16 and a blog post, both coming from the original author of exclusion constraints, I found the following example:
EXCLUDE USING GIST(room WITH =, during WITH &&)
Given the (reasonable) assumption that there much more distinct reservation periods than distinct rooms, I wonder if this should have been (during WITH &&, room WITH =) instead, given the quote regarding GIST column ordering above.
This makes me believe that I am not really understanding the quoted documented above.