2

Following on from Combining array_agg and unnest, given this data:

key |  a | b | c
------------------
 1  |  0 | 1 | {1,2}
 1  |  1 | 2 | {3}
 1  | -1 | 3 | {2}
 1  |  2 | 4 | {}

Executing this query:

SELECT 
  d.key,
  min(d.a) AS a,
  sum(d.b) AS b,
  array_agg(DISTINCT x.c) AS c 
FROM data AS d
  CROSS JOIN LATERAL unnest(d.c) AS x(c)
GROUP BY d.key

Gives the unexpected result:

key |  a | b | c
------------------
 1  | -1 | 7 | {1,2,3}

What's going on here, and how to get the correct sum?


Answer performance

On my actual data (16642 rows, 1942 keys, 6 aggregates), I get these cost estimates for each suggested solution.

OrangeDog
  • 338
  • 1
  • 3
  • 13

2 Answers2

2

The unnesting generates 2 rows for (key = 1, a = 0, b = 1), and the cross join removes the row with the empty array.

So your group by operates on the following set:

key | a  | b | c
----+----+---+--
  1 |  0 | 1 | 1
  1 |  0 | 1 | 2
  1 |  1 | 2 | 3
  1 | -1 | 3 | 2

One solution is to combine two group by queries, each one grouping n a different level:

select *
from (
  select d1."key", min(d1.a), sum(d1.b)
  from data d1
  group by d1."key"
) m
  join (
    select "key", array_agg(DISTINCT x.c) AS c 
    from data d2
      left join lateral unnest(d2.c) as x(c) on true
    where x.c is not null
    group by "key"
  ) a on a."key" = m."key";

Another approach is to only include the "first row" for each "unnested" group in the aggregates:

select d."key", 
       min(d.a) filter (where idx = 1 or idx is null), 
       sum(d.b) filter (where idx = 1 or idx is null),
       array_agg(distinct x.c)
from data AS d
  left join lateral unnest(d.c) with ordinality  AS x(c,idx) on true
group by d."key";

with ordinality returns the position of the unnested element in the original array. For the row with the empty array, that will be null.

0

One problem with your query is that the CROSS JOIN eliminates rows where unnest() produces no rows (happens for the empty array {}).

You could fix that with LEFT JOIN .. ON true, but the other problem is that rows are multiplied where unnest() returns multiple rows (happens for {1,2}).

Thats how you get 7 for the sum: 1 + 1 + 2 + 3.

Joining two separate subqueries might be simplest / fastest:

SELECT *
FROM (
   SELECT key, min(a) AS min_a, sum(b) AS sum_b
   FROM   data
   GROUP  BY 1
   ) x
JOIN  (
   SELECT key, array_agg(DISTINCT e) AS dist_c 
   FROM   data, unnest(c) e
   GROUP  BY 1
   ) y USING (key);

The plain join works because both subqueries have grouped by the same column - and as long as key is NOT NULL.

db<>fiddle here

Related:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633