0

I have the following setup:

  1. First, I create a temp table q10c_debug_sql to avoid clutter
create table q10c_debug_sql as
SELECT 
      movie_id, 
      company_id, 
      company_type_id 
    FROM 
      "postgres"."imdb_int"."movie_companies" 
    WHERE 
      (company_id) IN (
        SELECT 
          company_id 
        FROM 
          "postgres"."imdb"."q10c_company_name"
      ) 
      AND (company_type_id) NOT IN (
        SELECT 
          company_type_id 
        FROM 
          "postgres"."imdb_int"."company_type"
      )

The resulting table is an empty table

postgres=# select * from q10c_debug_sql;
 movie_id | company_id | company_type_id 
----------+------------+-----------------
(0 rows)
  1. Now, I issue the following two queries
postgres=# select count(*) from (select * from imdb_int.movie_companies except select * from q10c_debug_sql) as foo;
  count  
---------
 2549109
(1 row)

postgres=# select count() from (select from imdb_int.movie_companies as a left join q10c_debug_sql as b on a.movie_id = b.movie_id and a.company_id = b.company_id and a.company_type_id = b.company_type_id) as foo; count


2609129 (1 row)

As one can see they return different count. On paper, these two queries are equivalent and should return 2609129, the size of movie_companies table:

postgres=# select count(*) from imdb_int.movie_companies;
  count  
---------
 2609129
(1 row)

I don't know why this happens? I want to use EXCEPT for clarity but the query gives unexpected result. Any pointers are appreciated.

My psql versions

psql (15.3 (Ubuntu 15.3-1.pgdg20.04+1), server 13.11 (Ubuntu 13.11-1.pgdg20.04+1))
zack
  • 113
  • 5

1 Answers1

1

It turns out EXCEPT returns only distinct values, i.e., EXCEPT returns any distinct values from the left query that are not also found on the right query. Thus, semantically, EXCEPT is not the same as left join: the former is set semantics but the latter is bag semantics.

Thanks to this page for pointer.

postgres=# select count(distinct(movie_id, company_id, company_type_id)) from imdb_int.movie_companies;
  count  
---------
 2549109
(1 row)
zack
  • 113
  • 5