8

I have read all over the place that joins are associative and commutative.

So A join (B join C) should be the same as (A join C) join B.

But I have a really hard time understanding how this can be so. Assume that A has a property in common with B and B has a property in common with C, but A and C share no common properties to join on.

It seems that in that case, in the second instance (A join C) join B, when the operation took place to join A and C it would result in an empty set, making the join with B impossible.

Whereas in the first case it would work because B join C would result in the expected joined table that would have a property to join with A.

Justin
  • 183
  • 1
  • 1
  • 4

2 Answers2

6

You've made an incorrect assumption.

If three's a relationship between A and B, and between B and C but no relationship between A and C, joining A and C doesn't give you no rows. It actually gives you m * n rows, where m is the number of rows in A, and n is the number of rows in B.

A JOIN B on (a.id = b.A_id)
  JOIN C on (b.id = c.B_id)

becomes, if we join A and C first:

A JOIN C on (1 = 1)
  JOIN B on (a.id = b.A_id AND b.id = c.B_id)

NOTE: College was over 30 years ago - hopefully you understand conceptually, and can rewrite into the correct syntax for actual relational algebra.

If you have 20 rows in A, 30 in B, and 40 in C, and every row in B matches one and only one row in A, and every row in C matches one and only one row in B. When you Join A to B, you'll get 30 rows back; when you join those 30 rows to C, you'll get 40 rows back.

If you first join A to C, with no relationship to define which pairs of rows are valid, you'll get back every possible pair of rows - 800 rows. When we join B to the 800 rows, the connection between B and A means that a row in B can only match the AC rows that include one specific A row - there'll be 40 of those (one for each C row). Out of those 40 the only actual matches are the ones where the connection between B and C is also valid. And, we know that the C rows that match the current B row cannot match any other B row. So, for one B row we might have 2 AC rows that match, for another 4, for yet another just 1.

That said, we know every C row does match one (and only one) B row, so the total number of matches will come out to 40 again - the same 40 matches as if we had first matches A and B, or B and C.

So - the condition on a join doesn't tell you which rows do match - it really tells you which rows don't match. Take it away, and you've got a CROSS JOIN - the cross product of the rows in the two tables.

RDFozz
  • 11,731
  • 4
  • 25
  • 38
3

Considering

  • A, B and C are three tables
  • join operation is denoted by ⋈
  • Sets (circles in below venn diagram) are formed according to the join columns, that is A ⋈ B represents set of rows having same values for join columns for both A and B

Associativity of join can be visually proved with venn diagram as follows:

enter image description here

Commutativity of join can be visually proved with venn diagram as follows:

enter image description here

Update
(Earlier, diagram had ∩ instead of ⋈. Below explanation was made in the context of ∩.)

Thanks @Lennart to point out in comments that

"A join B" is not the same as "A intersect B"

I guess more explanation need to be provided as INTERSECT operation is not exactly same as INNER JOIN. I want to highlight following points from the link:

  1. INTERSECT operator returns almost same results as INNER JOIN clause many times.
  2. When using INTERSECT operator the number and the order of the columns must be the same in all queries as well data type must be compatible. But this does not have effect on the venn diagram above if consider what I have explicitly said in third bullet point above:

    A ∩ B represents set of rows having same values for join columns for both A and B

    So by ∩ symbol I really meant join, not intersect and with this point in consideration, join will follow the behavior of set intersection. I guess the mistake I made is by using symbol ∩ instead of ⋈. I did not had INTERSECT operation in mind while writing answer. I used it out of intuition as "logical intersection of two sets", but not as "database INTERSECT operation".

  3. INNER JOIN can simulate with INTERSECT when used with DISTINCT. But even if we dont include DISTINCT and keep it as INNER JOIN, the venn diagram still holds as the intersection of two circles / sets will contain duplicate records also.

Also note that this answer tries to explain associativity of left outer join using similar approach with venn diagram.

To wrap up I honestly feel that I should change symbol ∩ with ⋈.

Lennart - Slava Ukraini
  • 23,842
  • 3
  • 34
  • 72
RajS
  • 249
  • 2
  • 7