I have two tables: table1 with 61 million rows and table2 with 59 millions rows. The columns in both are the same (name and type). Both are imported from backup files.
I want to merge these two tables into one table keeping only unique records.
For example, in table1 I have the following records:
NAME CODE ...
Name1 001 ...
Name2 002 ...
Name3 003 ...
Name4 004 ...
And in table2:
NAME CODE ...
Name1 001 ...
Name2 002 ...
Name5 005 ...
Name2 002 ...
The result table should be something like this:
NAME CODE ...
Name1 001 ...
Name2 002 ...
Name3 003 ...
Name4 004 ...
Name5 005 ...
Edit to provide more info as requested by @Erwin Brandstetter:
table1has 15 GB. I am inserting records ontable2but when I finish it will have almost the same size.I have more than 300 GB of free space (and an external HD if needed).
Total RAM is 8 GB.
No indexes (i can create if needed).
No concurrent access (it is accessed just by me on local machine).
I can create a new, third table if needed.
Here if the full table definition:
CREATE TABLE inss
(
nb double precision,
nome character varying(200),
nasc double precision,
cpf double precision,
especie double precision,
dib double precision,
valor double precision,
banco_pagt double precision,
ag_banco double precision,
orgao_pag double precision,
aps double precision,
meio_pagto double precision,
banco_empr double precision,
contrato character varying(200),
vl_empres double precision,
comp_ini_d double precision,
parcelas double precision,
vl_parcela double precision,
tipo_empre double precision,
endereco character varying(200),
bairro character varying(200),
municipio character varying(200),
uf character varying(2),
cep double precision,
sit_empres double precision,
dt_averb double precision,
dt_exc double precision,
id double precision
);
The id column is filled with a row number, so it could be used to perform batch operations if the full operation becomes too slow.
All columns except the id column should be considered for the unique check.