1

I found one of our APIs using this query in our materialised view which generates aggregated report on asset inventory (PostgreSQL 13.3) and is very slow on a table with 1.2 million rows

SELECT COALESCE (SUM(CASE
    WHEN (t1.mech_id = 222 OR t1.workshop_representation_id = 222)
        AND t1.is_incoming = true
        THEN 1
        ELSE 0 
    END), 0) as point1,
 COALESCE(SUM(CASE 
    WHEN (t1.mech_id = 222 OR t1.workshop_representation_id = 222)
        AND t1.is_incoming = false
        AND t1.train_id NOT IN (
                SELECT distinct t1.train_id
                FROM mvw_engine_details x1 
                WHERE x1.inspection_date BETWEEN '2022-03-01' AND '2022-06-05'
                AND (x1.mech_id = 222 OR x1.workshop_representation_id = 222) AND x1.is_incoming = true
            )
        THEN 1
        ELSE 0 
    END), 0) as point2,
 COALESCE(SUM(CASE
    WHEN (t1.mech_id = 222 OR t1.workshop_representation_id = 222)
        AND t1.is_incoming = true
        AND t1.express_train = 'true'
        THEN 1
        ELSE 0 
    END), 0) as point1express,
  COALESCE(SUM(CASE 
    WHEN (t1.mech_id = 222 OR t1.workshop_representation_id = 222)
        AND t1.is_incoming = false
        AND t1.train_id NOT IN (
                SELECT distinct t1.train_id
                FROM mvw_engine_details x1 
                WHERE x1.inspection_date BETWEEN '2022-03-01' AND '2022-06-05'
                AND (x1.mech_id = 222 OR x1.workshop_representation_id = 222) AND x1.is_incoming = true
            )       
        AND t1.express_train = 'true'
        THEN 1
        ELSE 0 
    END), 0) as point2express
    FROM mvw_engine_details t1
    WHERE t1.inspection_date BETWEEN '2022-03-01' AND '2022-06-05'

enter image description here

After creating some index here are the current explain analyse stats:

Planning Time: 0.323 ms
Execution Time: 6.380 ms

CREATE INDEX IF NOT EXISTS mvw_transport_train_id_inspection_date 
    ON datamart.mvw_engine_details(inspection_date, train_id);

CREATE INDEX IF NOT EXISTS mvw_transport_inspection_date_idx ON datamart.mvw_engine_details (inspection_date) WITH (deduplicate_items = off);

CREATE INDEX IF NOT EXISTS mvw_transport_train_id_idx ON datamart.mvw_engine_details (train_id) WITH (deduplicate_items = off);

Is there any way to improve the query? or is it badly constructed?

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
gigz
  • 111
  • 4

1 Answers1

2

Your query isn't that slow. But after cleaning up, it should be a bit faster, yet:

WITH cte AS ( -- apply common basic filters *once*
   SELECT is_incoming, express_train, train_id
   FROM   mvw_engine_details
   WHERE  inspection_date BETWEEN '2022-03-01' AND '2022-06-05'
   AND    222 IN (mech_id, workshop_representation_id)
   )
SELECT count(*) FILTER (WHERE     is_incoming)                   AS point1
     , count(*) FILTER (WHERE NOT is_incoming
                        AND   no_other_incoming_train)           AS point2
     , count(*) FILTER (WHERE     is_incoming AND express_train) AS point1express
     , count(*) FILTER (WHERE NOT is_incoming AND express_train
                        AND   no_other_incoming_train)           AS point2express
FROM  (
   SELECT is_incoming, express_train
        , NOT EXISTS (
            SELECT FROM cte c2
            WHERE  c2.train_id = c1.train_id
            AND    c2.is_incoming
            ) AS no_other_incoming_train
   FROM   cte c1
   ) sub;

Using a CTE, we don't have to spell out (and apply) your common filter criteria repeatedly.

About the aggregate FILTER clause:

Unlike sum(), count() never returns NULL. We don't need to add COALESCE. See:

You have an "ugly OR" in the base query. See:

Depending on data distribution and cardinalities, rewriting with UNION might help:

SELECT is_incoming, express_train, train_id
FROM   mvw_engine_details
WHERE  mech_id = 222
AND    inspection_date BETWEEN '2022-03-01' AND '2022-06-05'
UNION
SELECT is_incoming, express_train, train_id
FROM   mvw_engine_details
WHERE  workshop_representation_id = 222
AND    inspection_date BETWEEN '2022-03-01' AND '2022-06-05'

Unless your time range is very selective, none of your disclosed indexes is particularly useful for this. The rule of thumb is "equality first, range later". See:

Ideally, you have indexes on (mech_id, inspection_date) and (workshop_representation_id, inspection_date).
If the sets from the two legs of the query are mutually exclusive, UNION ALL is faster, yet.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633