I'm moving up to the next level of my mystery query. It looks like there's a subselect inside of an exists, but on the same table. I think this could probably be simplified with an INNER JOIN higher up.
Using PostgreSQL 9.4.2.
Table definitions (/d+): https://gist.github.com/neezer/879f5d3649ca1903c6f3
Cardinalities:
billing_pricequote: 1,462,625 rows
billing_pricequotestatus: 3,331,657 rows
billing_lineitem: 43,687,855 rows
Here's the original query, without modifications suggested for the subquery inside EXISTS:
SELECT i.quote_id, i.acct_id AS account_id, SUM(i.delta_amount) AS amt
FROM billing_lineitem i
INNER JOIN billing_pricequote pq ON i.quote_id = pq.id
WHERE pq.date_applied AT TIME ZONE 'PST' BETWEEN '2016-02-02T00:00:00'::timestamp
AND '2016-03-03T22:27:41.734102-08:00'::timestamptz
AND EXISTS(
SELECT s1.quote_id
FROM billing_pricequotestatus s1
INNER JOIN (
SELECT DISTINCT ON (quote_id) quote_id, MAX(created_at) AS max_created_at
FROM billing_pricequotestatus
WHERE quote_id=i.quote_id
GROUP BY quote_id, created_at
ORDER BY quote_id, created_at DESC
) AS s2
ON s1.quote_id = s2.quote_id
AND s1.created_at = s2.max_created_at
WHERE s1.name IN ('adjustment','payment','billable')
)
GROUP BY i.quote_id, i.acct_id
;
The part I noticed looking weird is the SELECT on billing_pricequotestatus and then another subselect on the same table inside the INNER JOIN.
I tried changing this with the modification from my other SO post:
SELECT i.quote_id, i.acct_id AS account_id, SUM(i.delta_amount) AS amt
FROM billing_lineitem i
INNER JOIN billing_pricequote pq ON i.quote_id = pq.id
WHERE pq.date_applied AT TIME ZONE 'PST' BETWEEN '2016-02-02T00:00:00'::timestamp
AND '2016-03-03T22:27:41.734102-08:00'::timestamptz
AND EXISTS(
SELECT quote_id, MAX(created_at) AS max_created_at
FROM billing_pricequotestatus
WHERE quote_id=i.quote_id
AND name IN ('adjustment','payment','billable')
GROUP BY quote_id
)
GROUP BY i.quote_id, i.acct_id
;
That cut my execution time in half (~40 seconds to ~20 seconds), but yielded slightly different results (the original query returned 28,895 rows, but my new query returns 28,917 rows). It's not immediately clear to me why my modification didn't produce equivalent output (which it needs to).
EXPLAIN ANALYZE for both queries
Explain analysis for original query depesz.com.
Would really appreciate any help/guidance on this!
I tried updating @ypercubeᵀᴹ's answer with a LATERAL JOIN, and the performance seems about the same (each has an equal number of wins by less than a second):
SELECT i.quote_id, i.acct_id AS account_id, SUM(i.delta_amount) AS amt
FROM billing_lineitem i
INNER JOIN billing_pricequote pq ON i.quote_id = pq.id
LEFT JOIN LATERAL
( SELECT name
FROM billing_pricequotestatus
WHERE quote_id = i.quote_id
ORDER BY created_at DESC
LIMIT 1
) pqs ON true
WHERE pq.date_applied AT TIME ZONE 'PST' BETWEEN '2016-02-02T00:00:00'::timestamp
AND '2016-03-03T22:27:41.734102-08:00'::timestamptz
AND pqs.name IN ('adjustment', 'payment', 'billable')
GROUP BY i.quote_id, i.acct_id
;
Any other suggestions to get this below ≈10 seconds?