5

I have this table structure:

 id1 id2
 1   2 
 1   3
 1   4
 2   1
 2   5

I need to build a query to select id2 where id1 is not in id2. For example if id1 = 1 then only id2=3 and id2=4.
I tried this one:

SET @a=1;
SELECT DISTINCT x.id2 
 FROM tt x, tt y 
 WHERE x.id1=@a AND x.id1 != y.id2;

but it's not working.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
ReynierPM
  • 1,888
  • 10
  • 31
  • 49

2 Answers2

9

For the simple case of a single given id1:

SELECT DISTINCT x.id2   -- DISTINCT only needed if there are dupes
FROM   tbl x
WHERE  x.id1 =  @a
AND    x.id2 <> @a;

Generally, there are basically four techniques to do what you are after. Both columns could just as well be in different tables, almost the same problem.

NOT EXISTS

SELECT DISTINCT x.id2 
FROM   tbl x
WHERE  <some condition>
AND    NOT EXISTS (
    SELECT FROM tbl y
    WHERE  <some condition>
    AND    y.id1 = x.id2
    );

LEFT JOIN / IS NULL

SELECT DISTINCT x.id2 
FROM   tbl x
LEFT   JOIN tbl y ON y.id1 = x.id2
                 AND <some condition for y>
WHERE  <some condition for x>
AND    y.id1 IS NULL

EXCEPT

SELECT DISTINCT id2
FROM   tbl
WHERE  <some condition>

EXCEPT ALL -- ALL to make it faster - no dupes left after DISTINCT SELECT tbl.id1 FROM tbl WHERE <some condition>

NOT IN

SELECT DISTINCT x.id2 
FROM   tbl x
WHERE  <some condition>
AND    x.id2 NOT IN (
    SELECT DISTINCT id1
    FROM   tbl y
    WHERE  <some condition>
    );

You'll have to test which one is fastest for you. Benchmarks disagree. It depends on data distribution and other details. NOT IN rarely wins. It's mostly one of the first two.

As @ypercube commented: if (id1, id2) is unique, the DISTINCT clause is not needed - except the one in the NOT IN sub-query, which is meant to help performance.

Related answer with more details for PostgreSQL on SO:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
2
SET @a=1;
SELECT x.id2
FROM (SELECT * FROM tt WHERE id1=@a) x LEFT JOIN tt y
ON x.id2 = y.id1
WHERE y.id1 IS NULL;

I tried it against my query and it works

mysql> select * from tt;
+------+------+
| id1  | id2  |
+------+------+
|    1 |    2 |
|    1 |    3 |
|    1 |    4 |
|    2 |    1 |
|    2 |    5 |
+------+------+
5 rows in set (0.00 sec)

mysql> SET @a=1;
Query OK, 0 rows affected (0.02 sec)

mysql> SELECT x.id2
    -> FROM (SELECT * FROM tt WHERE id1=@a) x LEFT JOIN tt y
    -> ON x.id2 = y.id1
    -> WHERE y.id1 IS NULL;
+------+
| id2  |
+------+
|    3 |
|    4 |
+------+
2 rows in set (0.00 sec)

mysql>

UPDATE 2012-04-18 11:58 EDT

You can take the entire query and

  • make it a subquery
  • LEFT JOIN the subquery to the users, groups, and profiles tables

maybe something like this

SET @a=1;
SELECT u.*,g.*,p.* FROM
(
    SELECT x.id2 user_id
    FROM (SELECT * FROM tt WHERE id1=@a) x LEFT JOIN tt y
    ON x.id2 = y.id1
    WHERE y.id1 IS NULL
) keys
LEFT JOIN users u USING (user_id)
LEFT JOIN groups g USING (user_id)
LEFT JOIN profiles p USING (user_id)
;
RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536