2

My table looks like this:

+-----------+
|     id    | <- primary key
|    time   | <- DateTime
|    row2   | <- varchar
|    row3   |
|    ...    |
+-----------+

I want to remove all entries, except one, which are within a timeframe of one full hour and have the same value of row2 (distinct row2). Is it possible to do this without a stored procedure and/or non-sql code?

icefex
  • 447
  • 1
  • 5
  • 9

2 Answers2

4

The usual solution for this problem is first looking for the rows you want to keep, and then delete all others. The first part can be accomplished with an aggregate and grouping:

SELECT min(id)
FROM my_table
GROUP BY row2, DATE_FORMAT(`time`, '%Y-%m-%d %H')

Then you can write your DELETE:

DELETE FROM my_table
WHERE id NOT IN (
    SELECT min(id)
    FROM my_table
    GROUP BY row2, DATE_FORMAT(`time`, '%Y-%m-%d %H')
);

There are variations to this theme: you can use a NOT EXISTS construct instead of NOT IN (with appropriate changes), or a join using the USING keyword.

András Váczi
  • 31,778
  • 13
  • 102
  • 151
2

Back on Feb 22, 2011, I answered a question about a DELETE query using a WHERE clause against itself : ( Problem with MySQL subquery ).

Here is a safer approach:

  • Create a temp table called mytable_key
  • Collect keys with minimum ids per row2 per hour into mytable_key
  • Create a temp table called mytable_new
  • Join all keys in mytable_key with mytable and load it into mytable_new
  • Rename mytable to mytable_old
  • Rename mytable_new to mytable
  • Drop mytable_old
  • Drop mytable_key

Here is the actual SQL:

CREATE TABLE mytable_key
    SELECT id FROM mytable
    WHERE 1=2
;
INSERT INTO mytable_key
    SELECT MIN(id) FROM mytable
    GROUP BY row2,DATE_FORMAT(`time`,'%Y%m%d%H')
;
ALTER TABLE mytable_key ADD PRIMARY KEY (id);
CREATE TABLE mytable_new LIKE mytable;
INSERT INTO mytable_new
    SELECT B.* FROM mytable_key A
    INNER JOIN mytable B USING (id)
;
ALTER TABLE mytable RENAME mytable_old;
ALTER TABLE mytable_new RENAME mytable;
DROP TABLE mytable_old;
DROP TABLE mytable_key;

You can run these commands as is or place them in a stored procedure.

While I give dezso's answer a +1 because it works mechanically in Oracle, PostgreSQL and SQL Server, take it from a MySQL DBA, I don't trust MySQL's approach to internal SQL optimization via transformation when doing a DELETE on a table against itself. This is why I submitted this answer.

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536