6

I have a basic question on how JOIN works on multiple tables. I want to count occurrences of Foreign Key in link1 & link2

CREATE TABLE main (
   id SERIAL PRIMARY KEY,
   name text NOT NULL
);

CREATE TABLE link1 (
   id SERIAL PRIMARY KEY,
   main_id integer NOT NULL,
   CONSTRAINT main_id_fk FOREIGN KEY (main_id) REFERENCES main (id)
);

-- link2 is similar to link1

SQL Fiddle

Why does the query below give a product of counts (rather than sum) when the count is non-zero in both columns.

SELECT main.id, COUNT(link1.main_id) + COUNT(link2.main_id)
FROM main
LEFT JOIN link1 ON main.id=link1.main_id
LEFT JOIN link2 ON main.id=link2.main_id
GROUP BY main.id
user4150760
  • 1,129
  • 3
  • 14
  • 20

2 Answers2

8

What you see is a "proxy cross join". Aggregate first, then join:

SELECT m.id, COALESCE(l1.ct, 0) + COALESCE(l2.ct, 0) AS total_ct
FROM   main m
LEFT   JOIN (
   SELECT main_id, count(*) AS ct
   FROM   link1
   GROUP  BY main_id
   ) l1 ON l1.main_id = m.id
LEFT   JOIN (
   SELECT main_id, count(*) AS ct
   FROM   link2
   GROUP  BY main_id
   ) l2 ON l2.main_id = m.id
ORDER BY m.id;

Old sqlfiddle

Do not multiply rows with multiple unqualified joins and count(DISTINCT ...) later to fix that mistake. It happens to work in this case since counting distinct link1.id / link2.id coincides with the desired result, but it's needlessly expensive and error prone.

Detailed explanation and a couple of syntax variants in these related answers on SO:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
3

I'll attempt to answer it myself. Consider a LEFT JOIN between main & link1. The output would be

main.id   link1.main_id
   1          1
   1          1
   2          2
   3         NULL  
   4         NULL  

Now do a LEFT JOIN of the above table with link2, output would be:

main.id    link1.main_id    link2.main_id
   1             1               NULL  
   1             1               NULL  
   2             2                2
   2             2                2     -- Error : double counting for link1
   3            NULL              3
   4            NULL                   

Now count the occurrences of main_id & sum them (grouped by main.id)

main.id        Count
   1             2
   2             2 + 2  
   3             1
   4             0

So two successive LEFT JOIN are happening sequentially rather than in parallel. The correct approach to get the count would be do conduct 2 queries separately and then add the results

Update Another way according to @a1ex07 is

SELECT main.id, COUNT(DISTINCT link1.id) + COUNT(DISTINCT link2.id)
FROM main
LEFT JOIN link1 ON main.id=link1.main_id
LEFT JOIN link2 ON main.id=link2.main_id
GROUP BY main.id
user4150760
  • 1,129
  • 3
  • 14
  • 20