How to restore a pg_dumpall dump without CREATE INDEX?

Question

I am attempting to migrate from Postgres 9.6 to 10.3 and during the restore each index is recreated one by one - this is a problem.

So far I thought pg_dumpall is a good option.

pg_dumpall -U postgres -h localhost -p 5432 --clean --file=dumpall_clean.sql

Once this is done the file is around 1.2TB in size and I can load it to the new 10.3 instance with

psql -U postgres -h localhost -p 5433 < dumpall_clean.sql

simple.

Problem

As I learned the indicies are not backed up like tables are, they are simply recreated, and that is my problem.

The cluster has thousands of partitions each with several million rows and two indices (one BTREE and one GIST). This takes days since each index is created one by one.

As I have enough resources and I know which indices have to be created, I would perfer to do this step after the dump has been restored. Initially I made 8 FOR loops (to run in parallel) to go through the partitions, and created an index by moving a partition to a faster tablespace (SSD), create the index, then move the table and the index back to the default tablespace. So far this has worked for me.

Question How can I have the same result* of a pg_dumpall dump without recreating the indices when loading the dumpall_clean.sql file? A pg_dumpall --without-index would be nice.

"This currently includes information about database users and groups, tablespaces, and properties such as access permissions that apply to databases as a whole." - pg_dumpall manual

score 4 · Answer 1 · answered Nov 20 '20 at 17:26

It's been a while since this was up, but we need do something like this in our restores. Finding this answer has actually made me realize I can use something just like this to speed up my current restore by cutting out the index creation :)

You can use the -l and -L flags to pg_restore to list actions and use a list of actions.

From my notes in our script:

    # pg_restore -l gives a list of all operations that would be performed during the restore.
    # pg_restore -L accepts a list of operations from file to perform during the restore.

So you can use -l to dump the list of operations from an existing dump, filter it, and then run again w/ -L to accept that newly filtered list of operations.

In practice that looks something like:

    ${PGRESTORE} --dbname=db_restore -Fc -l dump_filename \
        | grep -v "public view_we_dont_want" \
        | grep -v "public postgres" >${tmpFile}
    ${PGRESTORE} --dbname=db_restore -Fc -L ${tmpFile} dump_filename

score 2 · Answer 2 · answered Apr 20 '18 at 05:58

I can see one workaround for this, by using pg_dumpall in two steps:

pg_dumpall --schema-only ....

Then edit the file and extract the index definitions into a second file. You also need to extract the foreign keys, because you have to run them manually after the import (probably together with the index creation script)

Then run that script (without the indexes) to create the (empty) tables. You

pg_dumpall --data-only ....

Then run that script to import the data into the new database. After that run the FK and index creation scripts.

score 1 · Answer 3 · answered Aug 20 '19 at 23:05

Before Upgrade;

Dump globals;

pg_dumpall --globals-only --file=globals.sql

Dump pre-data;

pg_dump --format=plain --create --section=pre-data --file=pre-data.sql db_name

Dump post-data;

pg_dump --format=custom --section=post-data --file=post-data.custom db_name
pg_dump --format=plain --section=post-data --file=post-data.sql db_name

Restore globals, pre-data, "grant" section of post-data (extract it from post-data.sql using text editor)

psql --file=globals.sql
psql --file=pre-data.sql
psql --file=post-data-permissions.sql db_name

Cut for upgrade...

Dump data;

pg_dump --format=directory --jobs=8 --section=data --compress=9 --file=data.d db_name

Restore data;

pg_restore --jobs=8 --dbname=db_name data.d

--> now db is ready for connection (and its slow of course)

Restore indexes fkeys and grants (yeah again but its ok)

pg_restore --jobs=8 --dbname=db_name post-data.custom

Jasen · Answer 4 · 2018-04-20T06:54:20.067

it should be possible to just filter them out using "grep" :

grep -v '^CREATE INDEX [^\t]*;$'  dump.sql | psql

or

pg_dumpall "source db connection string"  | \
grep -v '^CREATE INDEX [^\t]*;$'          | \
psql "destination db connection string"

should be safe unless you have matching lines inside stored code.

in your specific case:

grep -v '^CREATE INDEX [^\t]*;$' dumpall_clean.sql | psql -U postgres -h localhost -p 5433

How to restore a pg_dumpall dump without CREATE INDEX?

4 Answers4