Should I stop an awfully inefficient MySQL DELETE statement?

Question

I'm running the following statement on MySQL 5.5 in a 1GB 1CPU SSD virtual machine from Linode:

DELETE table
FROM (
    SELECT MAX(id) id, field
    FROM table
    GROUP BY field
) temp_table
INNER JOIN table
    ON table.field = temp_table.field
WHERE table.id != temp_table.id

It's been stuck in the Sending data state for more than 24 hours, and now I know why: it has an awful execution plan:

mysql> EXPLAIN SELECT 1 FROM (SELECT MAX(id) id, field FROM table GROUP BY field) temp_table INNER JOIN table ON table.field = temp_table.field WHERE table.id != temp_table.id
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 381888 |                                 |
|  1 | PRIMARY     | users      | ALL  | NULL          | NULL | NULL    | NULL | 984873 | Using where; Using join buffer  |
|  2 | DERIVED     | users      | ALL  | NULL          | NULL | NULL    | NULL | 984873 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
3 rows in set (46.12 sec)

(MySQL 5.5 doesn't allow EXPLAIN DELETE, so I'm doing it with EXPLAIN SELECT 1 as per this answer)

Please note that only the EXPLAIN itself is taking 46 seconds to complete.

Should I stop this statement and try to do things better, or should I wait?

score 0 · Accepted Answer · edited Apr 13 '17 at 12:42

I see what the query is doing. You are trying to DELETE a ton of rows and keep the last inserted id for every field.

I have a much better method.

DROP TABLE IF EXISTS keys_to_keep;
CREATE TABLE keys_to_keep
(
    id INT NOT NULL,
    PRIMARY KEY (id)
);
INSERT INTO keys_to_keep SELECT MAX(id) FROM mytable GROUP BY field;
CREATE TABLE mytable_new LIKE mytable;
INSERT INTO mytable_new
SELECT B.* FROM keys_to_keep A INNER JOIN mytable B USING (id);
ALTER TABLE mytable RENAME mytable_old;
ALTER TABLE mytable_new RENAME mytable;

Check mytable. If it has only the last occurrence of field, then you can

DROP TABLE keys_to_keep;
DROP TABLE mytable_old;

GIVE IT A TRY !!!

I have recommended this technique before

Oct 20, 2014 : Deleting data from a large table with replication
Dec 20, 2013 : Massive MySQL delete from CSV file
Jul 09, 2012 : What can cause TRUNCATE TABLE to take a really long time?

I have also discussed doing soft deletes as an alternative (require maintaining an additional mapping table of ID marked deleted or an extra column to flag a row deleted):

Feb 15, 2012 : How to implement soft deletes?
Mar 05, 2012 : Tombstone Table vs Deleted Flag in database syncronization & soft-delete scenarios
Nov 07, 2012 : Can I do anything about a very long running KILL command?
Aug 07, 2013 : How to delete duplicate records

score 0 · Answer 2 · answered Jul 08 '15 at 21:07

The EXPLAIN took so long because it evaluated the subquery. Perhaps the subquery took so long because of lack of INDEX(field, id).

When deleting a large chunk of a table, it is often faster to copy everything that you want to keep into a new table, then use RENAME to swap tables.

Or, you could do the deletes in chunks of 100-1000, walking through the PRIMARY KEY, if practical. More details in my delete blog.

Should I stop an awfully inefficient MySQL DELETE statement?

2 Answers2

GIVE IT A TRY !!!