On investigating an entry in our Postgres log, I noticed a large discrepancy between the reported duration and the speed of the query.
2015-07-28 17:27:26 UTC [dms_segment_3] LOG: duration: 2924.763 ms bind <unnamed>: SELECT prospect.id AS prospect_id, 0, am.id, am.customer_id, 0 FROM xxxxx.audience_member am LEFT OUTER JOIN xxxxx.campaign campaign ON campaign.id = $1 LEFT OUTER JOIN xxxxx.end_user prospect ON campaign.id=prospect.campaign_id AND prospect.email_id=am.customer_id AND prospect.end_user_type != 1 WHERE am.audience_id = $2 ORDER BY am.id limit $3
2015-07-28 17:27:26 UTC [dms_segment_3] DETAIL: parameters: $1 = '4000013578869', $2 = '4000013744916', $3 = '500'
Running an explain on that query yielded much faster speeds:
explain analyze
SELECT prospect.id AS prospect_id
, 0, am.id, am.customer_id, 0
FROM xxxxx.audience_member am
LEFT OUTER JOIN xxxxx.campaign campaign
ON campaign.id = 4000013578869
LEFT OUTER JOIN xxxxx.end_user prospect
ON campaign.id = prospect.campaign_id
AND prospect.email_id = am.customer_id
AND prospect.end_user_type != 1
WHERE am.audience_id = 4000013744916
ORDER BY am.id
limit 500;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=326.61..326.67 rows=26 width=24) (actual time=0.500..0.517 rows=4 loops=1)
-> Sort (cost=326.61..326.67 rows=26 width=24) (actual time=0.491..0.498 rows=4 loops=1)
Sort Key: am.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=0.00..326.00 rows=26 width=24) (actual time=0.260..0.402 rows=4 loops=1)
Join Filter: (campaign.id = prospect.campaign_id)
-> Nested Loop Left Join (cost=0.00..107.25 rows=26 width=24) (actual time=0.237..0.346 rows=4 loops=1)
-> Index Scan using idx_audience_member_audience_id on audience_member am (cost=0.00..99.62 rows=26 width=16) (actual time=0.062.
.0.071 rows=4 loops=1)
Index Cond: (audience_id = 4000013744916::bigint)
-> Materialize (cost=0.00..7.30 rows=1 width=8) (actual time=0.042..0.060 rows=1 loops=4)
-> Seq Scan on campaign (cost=0.00..7.30 rows=1 width=8) (actual time=0.154..0.219 rows=1 loops=1)
Filter: (id = 4000013578869::bigint)
Rows Removed by Filter: 23
-> Index Scan using idx_enduser_emailaddress on end_user prospect (cost=0.00..8.40 rows=1 width=24) (actual time=0.006..0.006 rows=0 loops=4
)
Index Cond: (email_id = am.customer_id)
Filter: ((end_user_type <> 1) AND (campaign_id = 4000013578869::bigint))
Total runtime: 0.701 ms
(17 rows)
Is there something I'm missing? What could account for the 2+ second gap between the log and what appears to be the "real" query duration?
We're using Postgres 9.2. In production we use a mixture of Hibernate queries and native SQL within a Tomcat 6 app.
UPDATE:
jpmc26's eagle eye spotted the parameters being passed as strings in the log but not in the explain, so I ran it again:
explain analyze SELECT prospect.id AS prospect_id, 0, am.id, am.customer_id, 0 FROM xxxxx.audience_member am LEFT OUTER JOIN xxxxx.campaign campaign ON campaign.id = '4000013578869' LEFT OUTER JOIN xxxxx.end_user prospect ON campaign.id=prospect.campaign_id AND prospect.email_id=am.customer_id AND prospect.end_user_type != 1 WHERE am.audience_id = '4000013744916' ORDER BY am.id limit '500';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=326.62..326.68 rows=26 width=24) (actual time=0.168..0.186 rows=4 loops=1)
-> Sort (cost=326.62..326.68 rows=26 width=24) (actual time=0.164..0.171 rows=4 loops=1)
Sort Key: am.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=0.00..326.01 rows=26 width=24) (actual time=0.065..0.145 rows=4 loops=1)
Join Filter: (campaign.id = prospect.campaign_id)
-> Nested Loop Left Join (cost=0.00..107.25 rows=26 width=24) (actual time=0.051..0.102 rows=4 loops=1)
-> Index Scan using idx_audience_member_audience_id on audience_member am (cost=0.00..99.62 rows=26 width=16) (actual time=0.017.
.0.025 rows=4 loops=1)
Index Cond: (audience_id = 4000013744916::bigint)
-> Materialize (cost=0.00..7.30 rows=1 width=8) (actual time=0.009..0.013 rows=1 loops=4)
-> Seq Scan on campaign (cost=0.00..7.30 rows=1 width=8) (actual time=0.023..0.031 rows=1 loops=1)
Filter: (id = 4000013578869::bigint)
Rows Removed by Filter: 23
-> Index Scan using idx_enduser_emailaddress on end_user prospect (cost=0.00..8.40 rows=1 width=24) (actual time=0.005..0.005 rows=0 loops=4
)
Index Cond: (email_id = am.customer_id)
Filter: ((end_user_type <> 1) AND (campaign_id = 4000013578869::bigint))
Total runtime: 0.259 ms
(17 rows)
This time the benefit of having the query in memory is evident, but otherwise nothing has changed.