1

I need to add a new BIGSERIAL column to a huge table (~3 billion records). This question is similar to what I need to do and the accepted answer has helped me somewhat. But I'm still wondering about something. In my case, the table already has a BIGSERIAL column which is the primary key, but many rows have been deleted so now there are gaps. (The table has subsequently been fully vacuumed.) I need to regenerate the values so that they are sequential again. Here are 5 example rows of what I want to achieve where the new_value > 1000:

+---------+---------+
|old_value|new_value|
+---------+---------+
|1026     |1001     |
|1027     |1002     |
|1030     |1003     |
|1032     |1004     |
|1039     |1005     |
+---------+---------+

I have successfully implemented the alternative approach as mentioned in the referenced answer above (CREATE TABLE ... and then INSERT INTO new_table SELECT * FROM ... ), but I would also like to attempt, and benchmark against, the initial suggestion. The problem, however, is that I don't know whether the new_value will be generated in the same order as the old_value as this is a requirement.

How can I ensure the order of the new_value column follows/tracks the order of the old_value column when the new_value column is added using a statement like this:

ALTER TABLE existing_table ADD COLUMN new_value BIGSERIAL;

A different approach

I also attempted the following (that works quite well on a small table), but it's much slower than the alternative suggestion of the referenced answer on very large tables:

ALTER TABLE existing_table ADD COLUMN new_value BIGINT NOT NULL DEFAULT 0;

UPDATE existing_table AS existing SET new_value = generated.new_id FROM ( SELECT original.old_value , row_number() OVER (ORDER BY original.old_value) AS new_id FROM existing_table AS original ) generated WHERE existing.old_value = generated.old_value;

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
HeatZync
  • 125
  • 1
  • 6

2 Answers2

1

First, consider the advise in the comments: do you really need to remove gaps? Typically, you don't. And gaps will be creeping back in. See:

Next, consider the updates to my old answer you have been working off:

Among other things, how to possibly avoid writing WAL for the table data in the new table - that will be the most significant bit to further improve performance.

How can I ensure the order of the new_value column follows/tracks the order of the old_value column when the new_value column is added using a statement like this:

ALTER TABLE existing_table ADD COLUMN new_value BIGSERIAL;

You cannot. Use one of the other routes.

I also attempted the following (that works quite well on a small table), but it's much slower than the alternative suggestion of the referenced answer on very large tables:

The UPDATE route is inherently more expensive than writing a pristine new table from scratch (adding indexes constraints afterwards).

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
-1

Assuming you have an existing key in the table to sort (either a date, existing key, etc), this will add a sequence to the table in order of your choosing:

-- Add a primary key. SERIAL works (IDENTITY does not)
ALTER TABLE payment_data ADD COLUMN payments_id SERIAL PRIMARY KEY;
-- Update ID's to value much higher than max id
UPDATE payment_data SET payments_id = payments_id + 100000000;
-- use sorted row_number() to change primary id
UPDATE payment_data t1 SET payments_id=newseqid FROM (SELECT row_number() OVER (ORDER BY trans_date) AS newseqid, trans_date FROM payment_data ORDER BY trans_date ) AS t2 WHERE t1.trans_date=t2.trans_date;
-- verify id is in expected order
SELECT payments_id, trans_date FROM payment_data ORDER BY trans_date;
Stickley
  • 99
  • 1