0

im using following query with PostgreSQL 9.1.13:

SELECT
    day::DATE AS date,
    COUNT(s.id) AS sale_count,
    COUNT(c.id) AS claim_count

FROM
    GENERATE_SERIES('2017-08-10', '2017-08-13', INTERVAL '1 day') day
        LEFT JOIN sale_order s ON s.date_confirm = day
        LEFT JOIN crm_claim c ON c.create_date = day

GROUP BY
    day.day

ORDER BY
    day.day

Expected Output:

date|sale_count|claim_count
---------------------------
2017-08-10|31|17
2017-08-11|32|23
2017-08-12|33|3
2017-08-13|21|0

Actual Output:

date|sale_count|claim_count
---------------------------
2017-08-10|527|527
2017-08-11|384|384
2017-08-12|99|99
2017-08-13|21|0

What am I doing wrong?

1 Answers1

1

You got multiple rows with the same date in both tables (e.g. 2017-08-10 31 & 17) and you join every row in the 1st table with that date to every row in the 2nd table with the same date, which results in 31 * 17 = 527 rows.

A quick solution is adding DISTINCT:

SELECT
    day::DATE AS date,
    COUNT(s.id) AS sale_count,
    COUNT(c.id) AS claim_count

but this is very inefficient, because it creates those 527 rows first and then applies DISTINCT.

Better count both tables seperatly before the join using Derived Tables:

SELECT
    day::DATE AS date,
    COALESCE(sale_count, 0),
    COALESCE(claim_count, 0)

FROM
    GENERATE_SERIES('2017-08-10', '2017-08-13', INTERVAL '1 day') day
        LEFT JOIN
         ( SELECT date_confirm, COUNT(*) AS sale_count 
           FROM sale_order
           WHERE date_confirm BETWEEN '2017-08-10' AND '2017-08-13'
           GROUP BY date_confirm
         ) AS s 
        ON s.date_confirm = day
        LEFT JOIN
         ( SELECT create_date, COUNT(*) AS claim_count
           FROM crm_claim
           WHERE create_date BETWEEN '2017-08-10' AND '2017-08-13'
           GROUP BY create_date
         ) AS c
        ON c.create_date = day

ORDER BY
    day.day

I added the filter on the date range for each table to remove unneeded data before joining.

dnoeth
  • 4,206
  • 13
  • 14