Find all values not present for other values of other column

Question

Let's suppose I have a table with integer columns a and b (each pair (a,b) is unique) and following data:

 a | b
-------
 1 | 2
 1 | 3
 1 | 4
 1 | 5
 2 | 1
 2 | 3
 2 | 4
 3 | 1
 3 | 3

Now I need to get all the rows whose b value is not contained in the set with some other a value (i.e. they appear just once in the union of rows with the 'a' values in question). From these I should get (1,2), (1,5) and (2,1).

I want to include only two sets (groups with the same a value; I want to use this as part of a function, where both a values are given as parameters) at a time, so I don't want to get (3,1) from the new data.

I have some (working) solution; I post it as an answer. However, there should be something more elegant - the query should be one and exploit the symmetricity, not two unioned queries. I thought of various solutions listed in this answer, but after some hour of playing with code I didn't find how to make anything better and working.

score 4 · Answer 1 · answered Aug 20 '14 at 11:25

If your pairs are unique, as in the example you gave, you want the values of b that appear only once, and this would be a solution:

SELECT a,b
FROM tbl1
WHERE b in (
  SELECT b
  FROM tbl1
  GROUP BY b
  HAVING (count(b)) = 1
  )

You can see it in this fiddle http://sqlfiddle.com/#!15/c82d3/3

score 2 · Accepted Answer · edited Apr 13 '17 at 12:42

If I interpret your question correctly:

"Pick all rows where a has one of two given values, and b only exists in combination with one of them."

You can use any of the standard techniques laid out in the referenced answer.

Select rows where value of second column is not present in first column

Just restrict your base table to the two given a. For instance

SELECT *
FROM   tbl t1
WHERE  a IN (1,2)
AND    NOT EXISTS (
   SELECT 1
   FROM   tbl t2
   WHERE  a IN (1,2)
   AND    t2.b =  t1.b
   AND    t2.a <> t1.a
   );

SQL Fiddle

This also works with duplicates on (a,b). Multiple identical rows would be returned.

score 1 · Answer 3 · answered Aug 20 '14 at 13:37

I think this solution is pleasingly simple:

SELECT coalesce(a1.a, a2.a) AS a, coalesce(a1.b, a2.b) AS b
FROM (
  SELECT a, b FROM tbl1 WHERE a = 1
) a1
FULL OUTER JOIN (
  SELECT a, b FROM tbl1 WHERE a = 2
) a2 
ON (a1.b = a2.b)
WHERE a1.a IS NULL or a2.a IS NULL;

What it does: self-join tbl1, filtering it into two distinct sets, a1 with values a=2, and a2 with values a=2. Do a full outer join as the self join - this is a join type that's like a simultaneous left and right join, i.e. nulls can appear on either side if there's no match. Then find rows where one side or the other is null, and return the values of the non-null side.

In other words: Find sets a=1 and a=2 then do a full anti-join and coalesce the results.

The plan looks promising too:

                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Hash Full Join  (cost=8.29..16.33 rows=1 width=16)
   Hash Cond: (tbl1.b = tbl1_1.b)
   Filter: ((tbl1.a IS NULL) OR (tbl1_1.a IS NULL))
   ->  Index Scan using tbl1_a_idx on tbl1  (cost=0.13..8.15 rows=1 width=8)
         Index Cond: (a = 1)
   ->  Hash  (cost=8.15..8.15 rows=1 width=8)
         ->  Index Scan using tbl1_a_idx on tbl1 tbl1_1  (cost=0.13..8.15 rows=1 width=8)
               Index Cond: (a = 2)
(8 rows)

SQLFiddle.

Unfortunately I don't think the approach can be generalized for 'n' sets using a recursive CTE, as you can't outer-join on the recursive term, at least in PostgreSQL. (Constantly annoys me).

score 0 · Answer 4 · answered Aug 20 '14 at 09:38

My original solution:

  SELECT a, b 
  FROM tbl t1
  WHERE t1.a = 1 AND NOT EXISTS (
    SELECT 1 FROM tbl t2
    WHERE t2.a = 2 AND t2.b = t1.b
  UNION
  SELECT a, b 
  FROM tbl t1
  WHERE t1.a = 2 AND NOT EXISTS (
    SELECT 1 FROM tbl t2
    WHERE t2.a = 1 AND t2.b = t1.b
  );

Michael Green · Answer 5 · 2014-08-20T23:43:16.540

I haven't run this so please forgive the odd typo:

with Table1 as
(
    select b from tbl where a = 1
)
, Table2 as
(
    select b from tbl where a = 2
)
, Added as
(
    select * from Table1
    UNION ALL
    select * from Table2
)
, Subtracted as
(
    select * from Table1
    INTERSECT ALL
    select * from Table2
)
, Unmatched as
(
    select * from Added
    EXCEPT 
    select * from Subtracted
)
select
    t.*
from Unmatched as u
inner join tbl as t
    on t.b = u.b
    and t.a in (1,2);

I'm guessing it will run dog slow, however, with all those table scans.

Fixed after Erwin's comment; removed a from the CTE and joined to the original tbl to get the output.

Find all values not present for other values of other column

5 Answers5