9

I need to delete the duplicate records in this table. However, there is no id for each row.

Example Data

product amount quantity
table 2000 5
chair 300 25
TV 30000 4
bike 300 25
table 2000 5
chair 300 25
chair 300 25

Expected Results

I need to get this result.

product amount quantity
table 2000 5
chair 300 25
TV 30000 4
bike 300 25

Script with ID

If there were an id, I could have used:

DELETE p1 FROM products p1
INNER JOIN products p2 
WHERE p1.id < p2.id AND p1.product = p2.product;
Edwin Babu
  • 193
  • 1
  • 1
  • 5

3 Answers3

23

There is no any field combination which identifies the record uniqually.

I see at least 2 different solutions.

First solution: move unique records to a copy of table and replace original table.

CREATE TABLE temp LIKE products;
INSERT INTO temp 
    SELECT DISTINCT * FROM products;
DROP TABLE products;
RENAME TABLE temp TO products;

Second solution: add temporary autoincrement, delete records using it, and drop temp field.

ALTER TABLE products ADD COLUMN temp SERIAL PRIMARY KEY;
DELETE t1.* 
    FROM products t1 
    LEFT JOIN ( SELECT MIN(temp) mintemp 
                FROM products
                GROUP BY field1,field2 /* , ... */ , fieldN) t2 
        ON t1.temp=t2.mintemp 
    WHERE t2.mintemp IS NULL;
ALTER TABLE products DROP COLUMN temp;


UPDATE

In second variant: the additional column definition as a primary key is redundant. It is enough to use

ALTER TABLE products ADD COLUMN temp SERIAL;
Akina
  • 20,750
  • 2
  • 20
  • 22
3

Apart from Akinas answer, You could delete both rows and then insert one.

You should also really, really add a primary key to your table even if you don't need to for performance, specifically to avoid situaitons like this.

Guran
  • 153
  • 6
3

You could do a

Delete <condition> limit 1

That will only delete 1 row, even if multiple rows matches the condition. This is explained in the official manual:

Order of Deletion

If the DELETE statement includes an ORDER BY clause, rows are deleted in the order specified by the clause. This is useful primarily in conjunction with LIMIT. For example, the following statement finds rows matching the WHERE clause, sorts them by timestamp_column, and deletes the first (oldest) one:

DELETE FROM somelog WHERE user = 'jcole'
ORDER BY timestamp_column LIMIT 1;
John K. N.
  • 18,854
  • 14
  • 56
  • 117
MTilsted
  • 155
  • 2