Due to an error in our application, one of the tables in our MS SQL has been flooded with duplicate rows differing only in PK identity column and timestamp
+--------+---------------------+--------------------+
| id | time | value |
+--------+---------------------+--------------------+
| #10003 | 2014-02-26 12:00:00 | uniq_pattern_1 |
| #20005 | 2014-02-26 13:00:00 | uniq_pattern_1 |
| #30018 | 2014-02-26 14:00:00 | uniq_pattern_1 |
| #40022 | 2014-02-26 15:00:00 | uniq_pattern_1 |
....................................................
| #10005 | 2014-02-26 12:00:00 | uniq_pattern_2 |
| #20008 | 2014-02-26 13:00:00 | uniq_pattern_2 |
| #30020 | 2014-02-26 14:00:00 | uniq_pattern_2 |
| #40040 | 2014-02-26 15:00:00 | uniq_pattern_2 |
+--------+---------------------+--------------------+
There is 100+ million rows with only about 50k unique patterns spread across entire table (every existing pattern was re-inserted every hour, with a few new patterns added to the mix). Table only has clustered PK index. I need to delete all duplicates leaving only earliest occurrence of each pattern.
I plan to do the following:
Grab earliest row for each uniq pattern into temp table.
INSERT INTO #MyTempTable
SELECT [value], min([time])
FROM [MyBigTable]
GROUP BY [value]
Truncate [MyBigTable] then populate it from #MyTempTable.
Is this a good way to do this?