40

It was brought to my attention that the USING construct (instead of ON) in the FROM clause of SELECT queries might introduce optimization barriers in certain cases.

I mean this key word:

SELECT *
FROM   a
JOIN   b USING (a_id)

Just in more complex cases.

Context: this comment to this question.

I use this a lot and have never noticed anything so far. I would be very interested in a test case demonstrating the effect or any links to further information. My search efforts came up empty.

The perfect answer would be a test case to show USING (a_id) with inferior performance when compared to the alternative join clause ON a.a_id = b.a_id - if that can actually happen.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633

1 Answers1

12

Erwin: I would concur with the idea that USING causing rigid ordering could well create many edge cases where optimal plans would be ruled out. I recently helped someone out who had something like this in his query:

LEFT JOIN ( 
     a 
     JOIN b ON a.id = b.a_id
     JOIN c ON b.c_id = c.id
) ON a.id = something.a_id
LEFT JOIN (
     table1 t1
     JOIN table2 t2 ON t1.some_field = t2.other_field
     JOIN talbe3 t3 ON t2.yafield = t3.something_else
) ON ....
repeat a few more times

In his case the worst of these join blocks was causing a nested loop join through some 200k rows, about 20k times (do the math), and since keys couldn't be pushed to indexes, it was a sequential scan. This meant that the overall query took about 3 hours to run due to cascading plan changes. By distributing the left join, the keys could be pushed down and the query ran in a matter of seconds. Of course this isn't exactly equivalent which is why the planner can't treat them as equivalent and so it was left figuring out that plan as a hash join and then doing a nested loop in, which was painfully slow.

Any time you rigidly force the joins to go through in a certain order you introduce cases where key filter information may not be available yet in the execution of the plan, and so what might be possible to do later in a quick index scan/hash join might be have to be done much slower in a nested loop/sequential scan and so while the above fragment is not immediately equivalent, it shows the same problem.

Chris Travers
  • 13,112
  • 51
  • 95