8

I know I have to write SUM twice, if I wish to use it in a HAVING clause (or use a derived table otherwise):

SELECT  id,
  sum(hours) AS totalhours
  FROM mytable
  GROUP BY id
  HAVING sum(hours) > 50;

My question now is, whether or not this is suboptimal. As a programmer, this query looks like the DB will calculate the sum twice. Is that so, or should I rely on optimizations the DB engine will do for me?

Update: an explain of a comparable query:

postgres=> explain select sum(counttodo) from orderline group by orderlineid having sum(counttodo) > 100;
                             QUERY PLAN                             
--------------------------------------------------------------------
 HashAggregate  (cost=1.31..1.54 rows=18 width=8)
   Filter: (sum(counttodo) > 100)
   ->  Seq Scan on orderline  (cost=0.00..1.18 rows=18 width=8)
(3 rows)
David Aldridge
  • 2,198
  • 11
  • 14

3 Answers3

3

The sum is only computed once.

I verified this using

create table mytable (id int, hours int);
insert into mytable values (1, 60);
select sum(hours) from mytable group by id having sum(hours) > 50;

and then used a debugger to check how many times int4_sum (the transition function behind the sum aggregate) was called: once.

Peter Eisentraut
  • 10,723
  • 1
  • 35
  • 35
0

Compare your query

explain
select sum(counttodo)
from orderline
group by orderlineid
having sum(counttodo) > 100

To this equivalent one an check in what they differ

explain
select *
from (
    select sum(counttodo) counttodo
    from orderline
    group by orderlineid
) s
where counttodo > 100
Clodoaldo
  • 1,145
  • 2
  • 8
  • 22
0

You don't have to write SUM twice if you don't need to retrieve it; if you're only interested in the ids having a SUM(hours) > 50 then the following is perfectly valid:

SELECT id,
FROM mytable
GROUP BY id
HAVING sum(hours) > 50;
Colin 't Hart
  • 9,455
  • 15
  • 36
  • 44