1

I have a huge table that has 1,990,546 users with numerous data. Sometimes I get email lists and dump them into this table also.

I want to delete users with duplicate email addresses. However, I want to keep users WHERE chips > 0.

This query ran for over an hour before I terminated it:

SELECT id, email  
FROM users 
WHERE id IN 
    ( 
        SELECT a.id 
        FROM users a, users b     
        WHERE (a.email = b.email) 
        AND (a.chips > 0)  
    )
LIMIT 100;

Is there a faster or better way to do this?

Hannah Vernon
  • 70,928
  • 22
  • 177
  • 323
E.C.
  • 11
  • 4

2 Answers2

3

I would use a temp table

#
# Collect All IDs Your Intend to Keep
#
CREATE TABLE KeepIDList ENGINE=MyISAM SELECT id FROM users;
CREATE TABLE KeepIDList ENGINE=MyISAM SELECT id FROM users;
INSERT INTO KeepIDList (id) SELECT min_id FROM
(SELECT email,MIN(id) min_id FROM users GROUP BY email) A;
ALTER TABLE KeepIDList ADD PRIMARY KEY (id);
#
# Collect All IDs Your Intend to Delete
#
CREATE TABLE ZapIDList ENGINE=MyISAM SELECT id FROM users;
INSERT INTO ZapIDList (id)
SELECT A.id FROM users A LEFT JOIN KeepIDList B WHERE B.id IS NULL;
ALTER TABLE KeepIDList ADD PRIMARY KEY (id);
#
# From Collected Keys to Delete, Remove Keys Whose chips > 0
#
DELETE B.* FROM users A INNER JOIN ZapIDList B WHERE A.chips > 0;
#
# From Collected Keys Left to Delete, Perform DELETE JOIN
#
DELETE A.* FROM users A INNER JOIN ZapIDList B;
#
# Drop Temp Tables
#
DROP TABLE KeepIDList;
DROP TABLE ZapIDList;

I have a very good reason for using temp tables as opposed to doing a Direct DELETE JOIN of a table against itself: Performing a DELETE using a subquery that references the same table experiencing the DELETE tends to make keys disappear intermittently as the query is being optimized. I wrote about this back on Feb 22, 2011 : Problem with MySQL subquery

CAVEAT

Please test this out with a smaller dataset to make sure my answer indeed performs what you asked. Make sure you make s backup of the users table. To coin a phrase from 1960's Mission Impossible: Should any of your data be caught or killed, I will disavow any knowledge of your actions (Start From 1:50 of the YouTube Video).

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
1

How about this?

DELETE t1
FROM users AS t1 
JOIN 
(
   SELECT id
   FROM users
   WHERE chips <= 0
   GROUP BY id
   HAVING COUNT(email) > 1
) AS t2 
ON t1.id = t2.id;

I hope this helps you.

Mr.Brownstone
  • 13,242
  • 4
  • 38
  • 55