6

I have a unique constraint on a character varying column that will mostly have lengths of approximately 600 but could get to 10 million.

I have read about B-Trees, but I can't determine how much disk space the index will consume or if such large-sized values will prevent the index from functioning performantly.

What are the impacts upon disk space and time of a B-Tree index on character varying columns with such potentially large lengths?

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633

1 Answers1

11

In Postgres, a unique constraint is implemented with a unique B-tree index. The manual:

Adding a unique constraint will automatically create a unique B-tree index on the column or group of columns used in the constraint.

Indexes use the same basic storing mechanisms as tables: arrays of data pages. Indexes have some minor additional features. The manual:

In a table, an item is a row; in an index, an item is an index entry.

Every table and index is stored as an array of pages of a fixed size (usually 8 kB, although a different page size can be selected when compiling the server).

The maximum size of an index entry is roughly a third of a data page. See:

The maximum size was reduced to 2704 bytes in Postgres 12. Still the same in Postgres 15. Meaning your UNIQUE constraint is not possible.

Consider a unique index on hash value, either after adding a redundant column with the hash, or just an index on the hash expression. You could use the built-in function md5() or something more efficient for huge values. Related:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633