Many-to-many: select an entity by matching relationship

Question

I have a many to many relationship implemented:

CREATE TABLE public.message (
    id BIGSERIAL PRIMARY KEY,
    name varchar(40) UNIQUE NOT NULL
);
CREATE TABLE public.package(
    id BIGSERIAL PRIMARY KEY,
    name varchar(40) UNIQUE NOT NULL
);
CREATE TABLE public.package_to_message (
    message_id BIGINT NOT NULL,
    package_id BIGINT NOT NULL,
    CONSTRAINT package_to_message_pk PRIMARY KEY (message_id, package_id)
);

I need to select a package that has an association with a precisely defined set of messages [message_name_1, message_name_2, message_name_3]. With all of the above and none more. Is it possible to do this using a more or less optimized query? Nothing comes to mind.

Erwin Brandstetter · Accepted Answer · 2024-07-22T22:47:57.423

The key word here is "relational division".

For the given specifications, it won't get faster than this:

SELECT package_id
FROM   package_to_message p1
JOIN   package_to_message p2 USING (package_id)
JOIN   package_to_message p3 USING (package_id)
WHERE  p1.message_id = (SELECT m.id FROM message m WHERE m.name = 'message_name_1')
AND    p2.message_id = (SELECT m.id FROM message m WHERE m.name = 'message_name_2')
AND    p3.message_id = (SELECT m.id FROM message m WHERE m.name = 'message_name_3')
AND    NOT EXISTS (  -- no other!
   SELECT FROM package_to_message p0
   WHERE  p0.package_id = p1.package_id
   AND    p0.message_id NOT IN (p1.message_id, p2.message_id, p3.message_id)
   );

But maybe you need a more generic/dynamic query ...

PREPARE q1(text[]) AS
WITH m AS (                              -- translate names to IDs
   SELECT id AS message_id
   FROM   message
   WHERE  name = ANY($1)
   )
SELECT p.package_id
FROM   m
JOIN   package_to_message p USING (message_id)
GROUP  BY 1
HAVING count(*) = cardinality($1)        -- length of array
AND    NOT EXISTS (                      -- no other message
   SELECT FROM package_to_message p0
   WHERE  p0.package_id = p.package_id
   AND    p0.message_id <> ALL(TABLE m)  -- m holds IDs
   );

Call:

EXECUTE q1('{message_name_1, message_name_2, message_name_3}');

My queries assume that you pass distinct message names (no duplicates). Plus, message.name is defined UNIQUE.

Note how ANY and ALL accept an array or a set. See:

How to use ANY instead of IN in a WHERE clause?

About the short syntax TABLE m:

Is there a shortcut for SELECT * FROM?

Indexes & performance

Index 1

The UNIQUE constraint on table message covers the lookup nicely.
If performance is crucial (and you meet preconditions for index-only scans) a covering index would be slightly better:

ALTER TABLE message
  DROP CONSTRAINT message_name_key  -- actual constraint name here!
, ADD  CONSTRAINT message_name_id_key UNIQUE (name) INCLUDE (id)
;

See:

Index 2

The PRIMARY KEY on package_to_message(message_id, package_id) provides the perfect index for the next step.

Index 3

Ideally, you add another index on package_to_message(package_id, message_id) for the final step. See:

Is a composite index also good for queries on the first field?

With these 3 indexes in place, and if your tables are vacuumed enough, both queries can make do with index-only scans exclusively. Then it takes a few ms at most, even for huge tables, or even below 1 ms execution time.

--

Charlieface · Answer 2 · 2024-07-22T23:07:54.977

Another option uses counts to verify we have all matches. Note that your question is specifically an example of multi-way Relational Division Without Remainder, not With.

PREPARE q1(text[]) AS
SELECT
    pm.package_id
FROM   message m
JOIN   package_to_message pm ON pm.message_id = m.id
GROUP BY
    pm.package_id
HAVING count(*) = cardinality($1)
   AND count(*) = count(*) FILTER (WHERE m.name = ANY($1));

This is a variation on a more generalized solution to any multi-way Relational Division, which is more efficient by using an array

SELECT
    pm.package_id
FROM   message m
JOIN   package_to_message pm ON pm.message_id = m.id
LEFT JOIN inputData i ON i.value = m.name
GROUP BY
    pm.package_id
HAVING count(*) = (SELECT count(*) FROM inputData)
   AND count(*) = count(i.value);

For the same thing for With Remainder, change the HAVING clause to HAVING count(*) >= (SELECT count(*) FROM inputData); and change the LEFT JOIN to a JOIN.

See also this article among others for further options for Relational Division.

Many-to-many: select an entity by matching relationship

2 Answers2

Indexes & performance

Index 1

Index 2

Index 3