9

I have a series of ETL jobs in which I create a swap table using CREATE TABLE table1_swap LIKE table1. In order to make the populating of table1_swap faster I do not include the indexes. When I'm finished loading however I need to re-apply those indexes to the newly populated table. These indexes are created outside of the scope of these ETL jobs so I would rather not have to hardcode the CREATE INDEX calls if I don't have to.

Is it possible to "transfer" or "copy" a set of indexes from one table to another?

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
Kyle Decot
  • 359
  • 1
  • 5
  • 14

4 Answers4

10

You can retrieve the complete DDL statement for each index from the Postgres system catalogs.

The following will retrieve all CREATE index statements for a single table:

select pg_get_indexdef(idx.oid)||';'
from pg_index ind
  join pg_class idx on idx.oid = ind.indexrelid
  join pg_class tbl on tbl.oid = ind.indrelid
  left join pg_namespace ns on ns.oid = tbl.relnamespace
where tbl.relname = 'your_table_name'
   and ns.nspname = 'your_table_schema';

You can spool the output into a script and run it after you have copied the data and swapped the table names.

7

Basics

The system catalog information function pg_get_indexdef(index_oid) returns index definitions.
Use the type regclass to simplify input. For a table called tbl in the public schema:

SELECT pg_get_indexdef(indexrelid) || ';' AS idx
FROM   pg_index
WHERE  indrelid = 'public.tbl'::regclass;  -- optionally schema-qualified

This includes all indexes: PK, partial, functional, unique, with special operator classes, etc.

Details for regclass:

Prepare DDL statements

Replace the old table name in the DDL script with the new one. But avoid false positives in the replacement, like a column name matching the table name.

With default names, index definitions look like this:

CREATE INDEX tbl_tbl_id_idx ON tbl USING btree (tbl_id);

CREATE INDEX tbl_people_gin_idx ON tbl USING gin (((data -> 'people'::text)) jsonb_path_ops);

CREATE INDEX tbl_comecol_nonull_idx ON tbl USING btree (somecol) WHERE (cutblade IS NOT NULL);

Note the 3 false positives, which we do not want to replace.

This query should work flawlessly - but verify that against your actual indexes yourself!

SELECT regexp_replace(regexp_replace(
          pg_get_indexdef(indexrelid)
        , 'tbl', 'tbl1')
        , ' ON tbl ', ' ON tbl1 ')
    || ';' AS idx
FROM   pg_index
WHERE  indrelid = 'public.tbl'::regclass;

The 1st regexp_replace() only replaces the first match in the name (unless instructed otherwise).
The 2nd regexp_preplace() is more specific and only replaces the actual table name.

For the given examples you get:

CREATE INDEX tbl1_tbl_id_idx ON tbl1 USING btree (tbl_id);

CREATE INDEX tbl1_people_gin_idx ON tbl1 USING gin (((data -> 'people'::text)) jsonb_path_ops);

CREATE INDEX tbl1_somecol_nonull_idx ON tbl1 USING btree (somecol) WHERE (cutblade IS NOT NULL);

But we have not yet considered non-standard table names, schemas or the search_path setting. All of that is built into the function below.

Full automation

If you trust your naming scheme ...

CREATE OR REPLACE FUNCTION f_copy_idx(_tbl text, _tbl1 text
                                    , _sch text = 'public', _sch1 text = 'public')
  RETURNS void
  LANGUAGE plpgsql SET search_path = '' AS
$func$
DECLARE
   _full_tbl  text := format('%I.%I', _sch, _tbl);
   _full_tbl1 text := format('%I.%I', _sch1, _tbl1);
   _sql text;
BEGIN
   SELECT string_agg(
             regexp_replace(regexp_replace(
                pg_get_indexdef(indexrelid)
              , _tbl,  _tbl1)
              , ' ON ' || _full_tbl || ' ', ' ON ' || _full_tbl1 || ' ')
           , '; ')
   FROM   pg_index
   WHERE  indrelid = _full_tbl::regclass
   INTO   _sql;

IF _sql IS NULL THEN RAISE WARNING 'Found no indexes to copy.'; ELSE -- RAISE NOTICE '%', _sql; -- debug? EXECUTE _sql; END IF; END $func$;

Works for any schemas, and names that require double-quoting.

The function sets an empty search_path within its scope (SET search_path = '') to force schema-qualified table names.

I added 'public' as default value for the schema parameters _sch and _sch1. If both tables reside in the public schema, you can simply omit schema input.

Call:

SELECT f_copy_idx('mytbl', 'newtbl');  -- in schema public

For different schemas and with table names requiring double-quoting:

SELECT f_copy_idx('old_TBL', 'table', 'public', 'New_SCHEmA');

fiddle
Old sqlfiddle

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
2

While you can't "copy" them you can use the database meta-data to dynamically generate the DDL to create them.

SELECT 'CREATE ' 
            || CASE
                WHEN i.indisunique THEN 'UNIQUE '
                ELSE ''
                END
            || 'INDEX '
            || nr.nspname
            || '.'
            || c2.relname
            || '_swap ON '
            || nr.nspname
            || '.'
            || c.relname
            || '_swap ( '
            || split_part ( split_part ( pg_catalog.pg_get_indexdef ( i.indexrelid, 0, true ), '(', 2 ), ')', 1 )
            || ' ) '
    FROM pg_catalog.pg_class c
    JOIN pg_catalog.pg_index i
        ON ( c.oid = i.indrelid )
    JOIN pg_catalog.pg_class c2
        ON ( i.indexrelid = c2.oid )
    JOIN pg_namespace nr
        ON ( nr.oid = c.relnamespace )
    WHERE nr.nspname = 'schema for table1'
        AND c.relname = 'table1' ;
gsiems
  • 3,413
  • 2
  • 23
  • 26
0

Kyle,

I'm afraid that, due to the storage structure of PostgreSQL, a simple copying of indexes is not possible. PostgreSQL uses a storage method called heap-structured tables, where the insertion of data into the table has no guaranteed ordering or position.

If PostgreSQL were to copy a complete table from one table record to another, including all versions of every row, information about deleted rows, etc., then I imagine that one would be able to copy indexes as well.

In the case of how PostgreSQL currently operates, copying over data from one table to another replica table only guarantees logical consistency: that is, every row present in the original table will be present in the new table, and the data will be a row by match at the row level only.

In terms of physical consistency, there is no guarantee that the way in which the data is physical stored on disk in the original table will match the physical storage of the data in the replica table.

The reason this is important is because an index is essentially a method for mapping some information about the data in the table to the CTID of the row which contains the data.

Long story short: other than some creative hacking, I believe the conventional wisdom is that you must rebuild these indexes on the new table. Sorry I don't have better news.

Chris
  • 2,457
  • 17
  • 20