Questions tagged [duplication]

The Duplication tag covers the topic of duplicated data and the strategies involved in removing the duplicates, often referred to as deduplication or simply dedupe.

Duplication refers to the inadvertent creation of duplicate entries in tables and the process for resolving the repeated entries.

In the simple case, this involves identifying a survivor row and eliminating the non-survivors.

A slightly more complex situation, survivorship logic must be employed. In the following example, the rule was to simply collapse non-empty values.

Before

CustomerName | Address | Phone
Bob          | 100 Elm | 
Bob          |         | 555.1234

After

CustomerName | Address | Phone
Bob          | 100 Elm | 555.1234

The truly complex scenarios stem from conflicting values. Consider the following example, there is a conflicting value for the phone value in our deduplication candidate. Which value is the "right" is likely to be dependent on the supplied business rules and/or data collection logic.

Before

CustomerName | Address | Phone
Bob          | 100 Elm | 555.4234
Bob          |         | 555.1234

After ¹

CustomerName | Address | Phone
Bob          | 100 Elm | 555.1234

After ²

CustomerName | Address | Phone
Bob          | 100 Elm | 555.4234

After ³

CustomerName | Address | Phone
Bob          |         | 555.1234

Which After is correct is an "it depends" answer.

The first After merged the changes together. Perhaps the data collection system only records changed data (and we further assume said system required address for an entry to be valid)

The second After declared the first row completely correct. Perhaps there was an additional field showing it was the most recently updated. Or because it was most completely filled out, the assumption is that it was the survivor.

The third After declared the second row completely correct. It could be that there was another field showing it to be the most recently updated (Bob is now homeless, pity him).

265 questions

votes

6 answers

Checking whether two tables have identical content in PostgreSQL

This has already been asked on Stack Overflow, but only for MySQL. I'm using PostgreSQL. Unfortunately (and surprisingly) PostgreSQL does not seem to have something like CHECKSUM table. A PostgreSQL solution would be fine, but a generic one would be…

postgresql duplication

asked Jul 29 '14 at 09:56

Faheem Mitha

1,049
2
12
18

votes

3 answers

select rows where column contains same data in more than one record

I have a table that has a column called article_title. Let's say the table name is articles. I need to find out the records where the article_title data is the same on more than one record. Here's what I've got: select a.* from articles a where…

mysql duplication

asked Nov 30 '13 at 03:52

somejkuser

votes

1 answer

Optimal way to ignore duplicate inserts?

Background This problem relates to ignoring duplicate inserts using PostgreSQL 9.2 or greater. The reason I ask is because of this code: -- Ignores duplicates. INSERT INTO db_table (tbl_column_1, tbl_column_2) VALUES ( SELECT …

postgresql constraint insert unique-constraint duplication

asked Dec 15 '12 at 22:25

Dave Jarvis

votes

1 answer

Multiple on conflict targets

I have two unique indexes on columns a and b. I need something like this: insert into my_table (a, b) values (1, 2), (1, 2) on conflict (a) do update set c = 'a_violation' on conflict (b) do update set c = 'b_violation' So generally I want to make…

postgresql postgresql-10 duplication upsert

asked May 08 '18 at 18:55

user606521

1,415
6
23
28

votes

2 answers

On duplicate key do nothing

I am inserting into the following table using LuaSQL with PtokaX API. CREATE TABLE `requests` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `ctg` VARCHAR(15) NOT NULL, `msg` VARCHAR(250) NOT NULL, `nick` VARCHAR(32) NOT NULL, …

mysql insert update duplication

asked Mar 29 '13 at 04:52

hjpotter92

votes

5 answers

How do I copy my SQL Azure database to a local SQL Server instance?

I have an OLTP database hosted on a SQL Azure instance. I want to pull a copy of the database down from the cloud so I can run some heavy extracts and OLAP-style queries against it without impacting the source database. How do I pull a copy of the…

backup duplication azure-sql-database

asked Sep 16 '11 at 22:12

Nick Chammas

14,810
17
76
124

votes

4 answers

How to check if a subquery has exactly one distinct result and a specified value concisely?

I found myself writing the following: select 'yes' where exists(select * from foo where val=1) and not exists(select * from foo where val<>1); and wondering if there is a more concise way without sacrificing too much readability. I found one way…

postgresql duplication

asked Sep 04 '12 at 15:42

Jack Douglas

40,517
16
106
178

votes

2 answers

Partial db dump/restore

A part of my database (PostgreSQL 9.3) relies on extra tables (ex. County, City, Town, ... ). I don't manage these tables, they are updated regularly by a third party. Each time I get a new full dump, but I have a hard time to push the changes back…

postgresql restore duplication

asked Oct 25 '14 at 14:28

FastRipeFastRotten

votes

2 answers

How do I remove duplicate records in a join table in PostgreSQL?

I have a table that has a schema like this: create_table "questions_tags", :id => false, :force => true do |t| t.integer "question_id" t.integer "tag_id" end add_index "questions_tags", ["question_id"], :name =>…

postgresql duplication

asked Mar 13 '13 at 00:49

marcamillion

votes

3 answers

How to delete duplicate records in MySQL, in a table without IDs?

I need to delete the duplicate records in this table. However, there is no id for each row. Example Data product amount quantity table 2000 5 chair 300 25 TV 30000 4 bike 300 25 table 2000 5 chair 300 25 chair 300 25 Expected…

mysql mysql-5.7 delete duplication

asked Aug 15 '18 at 04:50

Edwin Babu

votes

3 answers

Deleting all duplicates

I am trying to delete all the duplicates but keeping single record only (shorter id). Following query deletes duplicates but take lot of iterations to delete all copies and keeping original ones. DELETE FROM emailTable WHERE id IN ( SELECT * FROM…

mysql delete duplication

asked Sep 16 '11 at 13:22

Gary Lindahl

votes

1 answer

How do I insert record only if the record doesn't exist?

I have a PaymentInformation table ID NAME Start End 1 Tina 1/7/2014 1/17/2014 2 Alice 3/7/2014 3/17/2014 3 John 5/7/2014 5/17/2014 4 Michelle 7/7/2014 7/17/2014 I'm building my SQL query…

insert sqlite duplication

asked Oct 23 '17 at 01:40

Andy

votes

1 answer

Duplicate records returned from table with no duplicates

I have a stored procedure that queries a busy queue table that is used to distribute work in our system. The table in question has a primary key on WorkID and no duplicates. A simplified version of the query is: INSERT INTO #TempWorkIDs…

sql-server sql-server-2008-r2 duplication isolation-level

asked Sep 13 '17 at 20:49

JNK

18,064
6
63
98

votes

6 answers

How can I display twice each stored row without using UNION ALL or a temporary table?

The relevant table, named emp, holds the following data: CREATE TEMPORARY TABLE emp AS SELECT * FROM ( VALUES (1,'A'), (2,'B'), (3,'C') ); ID Name -- ---- 1 A 2 B 3 C And the output or result-set of the data manipulation operation…

duplication interview-question

asked Dec 04 '16 at 18:05

Anup

votes

3 answers

INSERT... ON DUPLICATE KEY UPDATE not working as I expect

I have a table called "Example" CREATE TABLE IF NOT EXISTS `example` ( `id` int(11) NOT NULL AUTO_INCREMENT, `a` int(11) NOT NULL, `b` int(11) NOT NULL, `c` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; I…

mysql update insert duplication

asked Aug 12 '15 at 11:02

Relax

2 3

…

17 18 Next