1

I'm running the following statement on MySQL 5.5 in a 1GB 1CPU SSD virtual machine from Linode:

DELETE table
FROM (
    SELECT MAX(id) id, field
    FROM table
    GROUP BY field
) temp_table
INNER JOIN table
    ON table.field = temp_table.field
WHERE table.id != temp_table.id

It's been stuck in the Sending data state for more than 24 hours, and now I know why: it has an awful execution plan:

mysql> EXPLAIN SELECT 1 FROM (SELECT MAX(id) id, field FROM table GROUP BY field) temp_table INNER JOIN table ON table.field = temp_table.field WHERE table.id != temp_table.id
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 381888 |                                 |
|  1 | PRIMARY     | users      | ALL  | NULL          | NULL | NULL    | NULL | 984873 | Using where; Using join buffer  |
|  2 | DERIVED     | users      | ALL  | NULL          | NULL | NULL    | NULL | 984873 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
3 rows in set (46.12 sec)

(MySQL 5.5 doesn't allow EXPLAIN DELETE, so I'm doing it with EXPLAIN SELECT 1 as per this answer)

Please note that only the EXPLAIN itself is taking 46 seconds to complete.

Should I stop this statement and try to do things better, or should I wait?

e18r
  • 123
  • 6

2 Answers2

0

I see what the query is doing. You are trying to DELETE a ton of rows and keep the last inserted id for every field.

I have a much better method.

DROP TABLE IF EXISTS keys_to_keep;
CREATE TABLE keys_to_keep
(
    id INT NOT NULL,
    PRIMARY KEY (id)
);
INSERT INTO keys_to_keep SELECT MAX(id) FROM mytable GROUP BY field;
CREATE TABLE mytable_new LIKE mytable;
INSERT INTO mytable_new
SELECT B.* FROM keys_to_keep A INNER JOIN mytable B USING (id);
ALTER TABLE mytable RENAME mytable_old;
ALTER TABLE mytable_new RENAME mytable;

Check mytable. If it has only the last occurrence of field, then you can

DROP TABLE keys_to_keep;
DROP TABLE mytable_old;

GIVE IT A TRY !!!

I have recommended this technique before

I have also discussed doing soft deletes as an alternative (require maintaining an additional mapping table of ID marked deleted or an extra column to flag a row deleted):

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
0

The EXPLAIN took so long because it evaluated the subquery. Perhaps the subquery took so long because of lack of INDEX(field, id).

When deleting a large chunk of a table, it is often faster to copy everything that you want to keep into a new table, then use RENAME to swap tables.

Or, you could do the deletes in chunks of 100-1000, walking through the PRIMARY KEY, if practical. More details in my delete blog.

Rick James
  • 80,479
  • 5
  • 52
  • 119