How can I improve performance of update query in PostgreSQL

Question

I have a table with about 80 million records. I have added a new attribute (heigth_category) which I want to fill based on another attribute (heigth).
If the heigth is between 0 and 6, I want to fill the heigth_category attribute with 1.

UPDATE schema.table t1 SET heigth_category = '1'
WHERE heigth >= 0 
AND heigth< 6;

This took about 13 hours. Is that normal for such a query?

The explain gives this:

Update on bomen t1  (cost=13353.34..1932816.30 rows=579346 width=316)
  ->  Bitmap Heap Scan on bomen t1  (cost=13353.34..1932816.30 rows=579346 width=316)
        Recheck Cond: ((boom_hoogt >= 0::numeric) AND (boom_hoogt < 6::numeric))
        ->  Bitmap Index Scan on boom_hoogt_idx  (cost=0.00..13208.50 rows=579346 width=0)
              Index Cond: ((boom_hoogt >= 0::numeric) AND (boom_hoogt < 6::numeric))

I working with PostgreSQL 9.1.12

score 4 · Accepted Answer · answered Feb 27 '14 at 15:32

In general mass-updates like this can take a long time, independent of the RDBMS used.

Often a lot of time can be saved by running the update in batches of a few thousand rows at a time. Something like:

UPDATE schema.table set ...
 WHERE id >=0 and id < 10000

UPDATE schema.table set ...
 WHERE id >=10000 and id < 20000

...

Instead of manually typing the different update statements you certainly can use a loop too. Just make sure that there is no overarching transaction, as that would negate the effort.

How can I improve performance of update query in PostgreSQL

1 Answers1