9

Lets take an array of ids of size n in the application level. for instance [132,3425,13,13,... 392] where n is bigger than 100k entries. Some of this entries are repeated.

I need to check if all of these registers are contained in a MySQL table containing more than 5M entries.

Now I'm checking one after another, but this process takes a very long time.

How can I make this verification in one single check, or maybe make this more efficient.

I would like to know if all of them exists on the table, and, if possible, know which don't already exists on that table.

Daniel Santos
  • 293
  • 2
  • 4
  • 9

3 Answers3

4
CREATE TEMPORARY TABLE B_Temp (
     id BIGINT NOT NULL
     PRIMARY KEY (id)
     INDEX unid (id)
);

I would load a temp table as in @AMG 's answer. But my query would look like this:

select id
from a
where not exists (
select *
from B_Temp
where a.id = b.id
)

You would want to disable the index on the id column before you load the data, and re-enable it after to get this to perform well. If you are doing this on a regular basis, you might consider making this a permanent table.

David Buttrick
  • 206
  • 1
  • 4
4

First count the number of distinct numbers. You have called it n. Then

SELECT COUNT(*) FROM tbl
    WHERE id IN (very-long-list)

See if that returns n.

If, instead, you have put those numbers in another table, then this might be optimal:

Table `nums` contains the `n` values.
ALTER TABLE tbl ADD INDEX(id);  -- if it does not already exist.
SELECT ! EXISTS ( SELECT * FROM nums LEFT JOIN tbl USING(id)
                      WHERE tbl.id IS NULL )

That will check each nums.id to see if it is in your tbl. More importantly, it stops when the first missing item is found. The output is 1 or 0.

("very-long-list" is a "commalist" or "comma-sparated-list")

Rick James
  • 80,479
  • 5
  • 52
  • 119
3

I recommend to use the application code to insert the array into a Temporary table, including an identity field so you can store every record, even if it is repeated. (the syntaxis is the same as CREATE TABLE but using CREATE TEMPORARY TABLE):

CREATE TEMPORARY TABLE A (
     id BIGINT NOT NULL AUTO_INCREMENT,
     number BIGINT NOT NULL,
     PRIMARY KEY (id)
);

Then make a query to check/determine if the number exists in the large table. Something like this:

--supose the temporary Table is A and the big table is B and the fields you are comparing ara called NUMBER.

SELECT A.id, A.number as ARRAY_NUMBER,
CASE
    WHEN B.number IS NULL THEN 'not found'
    ELSE 'found'
END as FOUND_IN_TABLE
FROM A LETF OUTER JOIN B ON A.number = B.number

good luck!

AMG
  • 1,372
  • 7
  • 12