14

I'm considering using CLUSTER to reorder a table by an index. I understand that this recreation of the table data makes all the existing indexes either bloat or be useless. I've seen some indications that a REINDEX is required after a CLUSTER. I've found other references that indicate that CLUSTER does a REINDEX. The Official Documentation says nothing at all about REINDEX being part of CLUSTER or required (Although it does suggest running ANALYZE after the CLUSTER)

Can anyone definitively (i.e. with some sort of reference to official docs) say whether or not a REINDEX is required after a CLUSTER?

TREE
  • 401
  • 4
  • 10

4 Answers4

17

You do not need to reindex, because CLUSTER effectively does it for you.

More specifically, CLUSTER locks the source table then creates a new copy of it ordered according to the target index. It creates indexes on the new copy then replaces the old table and indexes with the new ones.

Note that this is also true of VACUUM FULL in 9.0+.

If you've been seeing discussion suggesting that CLUSTER bloats indexes it could be people who're assuming that CLUSTER works like pre-9.0 VACUUM FULL. You might also be seeing and misreading discussions that mention index bloat caused by the old VACUUM FULL implementation and suggesting CLUSTER as an alternative.

This is implied in the documentation:

a temporary copy of the table is created that contains the table data in the index order. Temporary copies of each index on the table are created as well. Therefore, you need free space on disk at least equal to the sum of the table size and the index sizes

What it doesn't say, but should, is that those temporary copies then replace the original table. (Bold mine).

Craig Ringer
  • 57,821
  • 6
  • 162
  • 193
8

I'm with a_horse_with_no_name on this: you don't need to recreate the indexes. Besides that the CLUSTER documentation does not mention it, we can further consult the REINDEX page, too:

There are several scenarios in which to use REINDEX:

  • An index has become corrupted, and no longer contains valid data. Although in theory this should never happen, in practice indexes can become corrupted due to software bugs or hardware failures. REINDEX provides a recovery method.

  • An index has become "bloated", that it is contains many empty or nearly-empty pages. This can occur with B-tree indexes in PostgreSQL under certain uncommon access patterns. REINDEX provides a way to reduce the space consumption of the index by writing a new version of the index without the dead pages. See Section 23.2 for more information.

  • You have altered a storage parameter (such as fillfactor) for an index, and wish to ensure that the change has taken full effect.

  • An index build with the CONCURRENTLY option failed, leaving an "invalid" index. Such indexes are useless but it can be convenient to use REINDEX to rebuild them. Note that REINDEX will not perform a concurrent build. To build the index without interfering with production you should drop the index and reissue the CREATE INDEX CONCURRENTLY command.

Clearly, CLUSTER does not fall into any of these cases.

And there is a small sentence in the CLUSTER docs:

[while clustering] Temporary copies of each index on the table are created as well.

This suggests that just like the table itself, the indexes get reordered during the process as well - this way making reindexing useless.

András Váczi
  • 31,778
  • 13
  • 102
  • 151
7

Found a reference, in the Recovering Disk Space section.

If you have such a table and you need to reclaim the excess disk space it occupies, you will need to use VACUUM FULL, or alternatively CLUSTER or one of the table-rewriting variants of ALTER TABLE. These commands rewrite an entire new copy of the table and build new indexes for it.

TREE
  • 401
  • 4
  • 10
-3

Analysing all the answers, in my opinion the right way to do that is to reindex BEFORE cluster. As the documentation doesn't tell if cluster do or not a reindex, and only a copy of the index, ordered or not, I think that an indexed index will result in a better clustered table. After that an analyse will finish the job. A vacuum full before all seems to be useless, unless cluster and/or reindex do not free dead tuples