3

My Task is to optimize a query, which should check for data inconsistency: Return all leads ids with more than one entries and multiple is_first flag.

| id | lead_id | is_first| (other data) |
|----|---------|---------|-------- ...
|   1|       20|        1|...
|   2|       20|        0|
|   3|       21|        1|
|   4|       21|        0|
|   5|       21|        1|
|   6|       22|        1|

So this dataset should return 21. (20 ok, because is_first is only true/1 once)

Old Query:

SELECT DISTINCT s1.lead_id as lead_id
FROM history as s1
INNER JOIN history as s2 ON
  s1.is_first = 1 AND
  s2.is_first = 1 AND
    s1.id != s2.id AND
    s1.lead_id = s2.lead_id;

New Query: I already increased performance somewhat with this query

SELECT history.lead_id AS lead_id
FROM history
WHERE is_first = 1 GROUP BY lead_id
HAVING count(lead_id) != 1;

But I feel it's not end of the line. Any recommendations?

Evan Carroll
  • 65,432
  • 50
  • 254
  • 507
Mruf
  • 133
  • 4

2 Answers2

2

If you dont have it already, you could create a index on both fields: is_first and lead_id;


If it's a table with few record you won't see much difference, so I always recommend to check the statistics from the query.

2

I think what you want is something like this..

CREATE TABLE history(id int, lead_id int, is_first bool);
INSERT INTO history(id,lead_id,is_first)
VALUES
  ( 1, 20, 1 ),
  ( 2, 20, 0 ),
  ( 3, 21, 1 ),
  ( 4, 21, 0 ),
  ( 5, 21, 1 ),
  ( 6, 22, 1 );

SELECT *
FROM history AS h1
WHERE EXISTS (
  SELECT 1
  FROM history AS h2
  WHERE h1.lead_id = h2.lead_id
  GROUP BY lead_id
  HAVING count(is_first OR NULL) > 1
);
+------+---------+----------+
| id   | lead_id | is_first |
+------+---------+----------+
|    3 |      21 |        1 |
|    4 |      21 |        0 |
|    5 |      21 |        1 |
+------+---------+----------+
3 rows in set (0.00 sec)

If you just want the lead_id of 21

SELECT lead_id FROM history
GROUP BY lead_id
HAVING count(is_first OR NULL) > 1;
Evan Carroll
  • 65,432
  • 50
  • 254
  • 507