I am looking to remove older data from a table which consists of 889 million rows of data.
I have a script but I am trying to make it more robust as removing approximately 418 million rows of data to be precise.
Am running in Postgres 9.6, with the table and FK Constraints
Column | Type | Collation | Nullable |
Default
--------------------------+--------------------------+-----------+----------+---
------------------------------------
game_id | integer | | not null | ne
xtval('game_game_id_seq'::regclass)
game_id | integer | | not null |
session_id | integer | | |
game_created_on | timestamp with time zone | | not null |
currency_code | character(3) | | not null |
game_cash_staked | numeric(12,2) | | |
game_cash_won | numeric(12,2) | | |
game_bonus_staked | numeric(12,2) | | |
game_bonus_won | numeric(12,2) | | |
game_created_by_id | integer | | not null |
game_remote_ref | character varying(50) | | |
game_description | text | | |
game_batch_id | integer | | |
game_rejection_code_id | integer | | |
game_rejection_message | character varying(255) | | |
game_transfer_remote_ref | character varying(128) | | |
Indexes:
"game_pkey" PRIMARY KEY, btree (game_id)
"idx_game_created_on_rejection_code" btree (game_created_on) WHERE game_rejection_code_id IS NULL
"idx_game_game_created_on" btree (game_created_on)
"idx_game_session_id" btree (session_id)
"game_idx_01" btree (game_remote_ref)
"game_idx_game_id" btree (game_id)
Foreign-key constraints:
"ref_game_to_currency" FOREIGN KEY (currency_code) REFERENCES currency(currency_code)
"ref_game_to_game" FOREIGN KEY (game_id) REFERENCES game(game_id)
"ref_game_to_game_rejection_code" FOREIGN KEY (game_rejection_code_id) REFERENCES game_rejection_code(game_re
jection_code_id)
Scipt have looked to use:
begin;
CREATE TABLE gamearchived AS
SELECT t.*
FROM game t
where t.game_created_on < NOW() - interval '1 year'; -- this grabs stuff Older than 1 year
delete from game t
where t.game_id in (select gamearchived.game_id from gamearchived);
select count (*) from gamearchived
COMMIT;
I am wondering if this is the safest way to remove the old data from the main table or to do in batches. Also my current table which I will be removing data from has indexes and foreign key constraints, would it be better to drop the indexes first prior to deleting and then adding them back in once done. The amount of data removed is approximately 450 million rows.
Need to keep the old data so it needs to be accessible. Any advice is much appreciated.