0
SELECT message."id"  AS "id"
     , message."transactionId"  AS "transactionId"
     , "ids"
     , message"createdAt" AS "createdAt"
FROM message as message
     JOIN (
         SELECT
              "transactionId",
              "createdAt",
              array_agg(thisMessage.id) as "ids",
               min(thisMessage.id::text)::uuid as "id"
          FROM message as thisMessage
          WHERE thisMessage."groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
          GROUP BY "transactionId", "createdAt"
          ORDER BY createdAt DESC
          OFFSET 0 LIMIT 9
      ) as groupM
            ON message."id" = groupM."id"
WHERE
     message."groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
ORDER BY message."createdAt" DESC
OFFSET 0 LIMIT 3;

Here is the table

"id" "groupId" "transactionId" "createdAt"
1 a5086f93-7e62-47ea-9419-6e3800b2a15e null 2024-05-03 14:45:46.881317
2 a5086f93-7e62-47ea-9419-6e3800b2a15e 20d1e6a7-5cca-48bf-8ac2-ba8fb9f38772 2024-05-03 15:45:46.881317
3 a5086f93-7e62-47ea-9419-6e3800b2a15e 20d1e6a7-5cca-48bf-8ac2-ba8fb9f38772 2024-05-03 15:45:46.881317
4 a5086f93-7e62-47ea-9419-6e3800b2a15e null 2024-05-03 17:45:46.881317
5 a5086f93-7e62-47ea-9419-6e3800b2a15e null 2024-05-03 18:45:46.881317

Desired result:

"id" "transactionId" "ids" "createdAt"
5 null null 2024-05-03 18:45:46.881317
4 null null 2024-05-03 17:45:46.881317
2 a5086f93-7e62-47ea-9419-6e3800b2a15e {2, 3} 2024-05-03 15:45:46.881317

The main idea is to group rows with transactionId and count groups as one row, but don't aggregate rows with transactionId IS NULL. So from 2 rows with transactionId and 2 rows with null, LIMIT 3 should return one grouped by transactionId, and 2 with null.

But on large tables, it works very slow. How can I improve it?

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633

1 Answers1

0

Null values are treated as equal by GROUP BY, so those rows would be aggregated, too. See:

Break out null cases to keep them separate, and only aggregate not-null cases. UNION ALL the full 3 latest rows from each to be sure - meaning the sum of LIMIT + OFFSET. (Doesn't add much cost for a small number.) Apply ORDER BY, OFFSET & LIMIT in the outer SELECT again:

SELECT * FROM
(
SELECT "transactionId", "createdAt"
     , null::uuid[] AS ids
     , id
FROM   message
WHERE  "groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
AND    "transactionId" IS NULL
ORDER  BY "createdAt" DESC, id  -- added id tiebreaker
LIMIT  3  -- no OFFSET
)
UNION ALL 
(
SELECT "transactionId", "createdAt"
     , array_agg(id) AS ids
     , any_value(id) AS id  -- pg 16+, see below!
FROM   message
WHERE  "groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
AND    "transactionId" IS NOT NULL
GROUP  BY "transactionId", "createdAt"
ORDER  BY "createdAt" DESC, id  -- added id tiebreaker
LIMIT  3  -- no OFFSET
)
ORDER  BY "createdAt" DESC, id  -- added id tiebreaker
OFFSET 0
LIMIT  3;

To make it fast, have at least a multicolumn index on ("groupId", "createdAt") - with fields in this order.
Possibly another partial index for null cases if those are rare.

The odd construct min(id::text)::uuid AS id indicates id is actually type uuid (which has no > / < operators, hence doesn't allow min()), and this is your way to pick one arbitrary value from the set.
Postgres 16 added the convenient aggregate function any_value() for that task precisely. Fall back to your construct in older versions, or use first() from the additional module first_last_agg. See:

"createdAt" may not be unique, making the sort order non-deterministic and results arbitrary (possible alternating between calls). Add tiebreaker(s) to get deterministic results.

Aside:
Use legal, lower-case, unquoted identifiers if at all possible. See:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633