1

Why might this query be taking nearly 2 hours to run when similar queries complete within a minute?

CREATE TABLE T3 AS
SELECT 
        A.*, 
        COALESCE(C.FIELD1,0) AS FIELD1A,
        COALESCE(B.FIELD2,0) AS FIELD2A,
        COALESCE(B.FIELD3,0) AS FIELD3A,
        COALESCE(C.FIELD4,0) AS FIELD4A,
        COALESCE(C.FIELD5,0) AS FIELD5A,
        B.KEY1,
        COALESCE(C.FIELD6,0) AS FIELD6A
FROM T1 A
LEFT JOIN T2 B
ON   A.KEY2 = B.KEY1 
AND  B.TIMEFIELD1 <= DATE '28FEB2013'
AND  B.TIMEFIELD2 > DATE '28FEB2013'
AND  B.FIELD7= 'X' 
LEFT JOIN T2 C
ON   A.KEY3 = C.KEY1 
AND  C.TIMEFIELD1  <= DATE '28FEB2013'
AND  C.TIMEFIELD2 > DATE '28FEB2013'
AND  C.FIELD7= 'X' 

DISTRIBUTE ON RANDOM

T1 has about a 400,000 rows; T2 has about a million rows. Both are distributed on random. T3 has the same row count as T1 (as expected).

EXPLAIN suggests the query plan uses nested loops.

(Apologies for the generic column names).

jl6
  • 815
  • 7
  • 15

2 Answers2

1

I have noticed that the optimizer doesn't enforce the constraints on left joins efficiently. I have success in this situation by forcing the constraints in sub queries. Consider revising your query like below:

CREATE TABLE T3 AS
SELECT 
        A.*, 
        COALESCE(C.FIELD1,0) AS FIELD1A,
        COALESCE(B.FIELD2,0) AS FIELD2A,
        COALESCE(B.FIELD3,0) AS FIELD3A,
        COALESCE(C.FIELD4,0) AS FIELD4A,
        COALESCE(C.FIELD5,0) AS FIELD5A,
        B.KEY1,
        COALESCE(C.FIELD6,0) AS FIELD6A
FROM T1 A
LEFT JOIN 
(select * from T2 where TIMEFIELD1 <= DATE '28FEB2013'
AND  TIMEFIELD2 > DATE '28FEB2013'
AND  FIELD7= 'X' ) B
ON   A.KEY2 = B.KEY1 
LEFT JOIN 
(select * from T2  where AND  TIMEFIELD1  <= DATE '28FEB2013'
AND  TIMEFIELD2 > DATE '28FEB2013'
AND  FIELD7= 'X') C
ON   A.KEY3 = C.KEY1 


DISTRIBUTE ON RANDOM
Niederee
  • 111
  • 3
0

First thing you should do is add a distribute for the tables on for the join on and organize on the where clause.

Joe
  • 34
  • 2