First off, as commented, work with the data type uuid for valid UUID values not varchar(32). Cheaper, safer, cleaner. See:
The answer to the main question depends on the complete picture. You provided some good information, much more is relevant:
- Do these UUID naturally belong to the main table? Are they attributes of the entity described in the table?
- How wide is the row?
- How much concurrent write activity? Typical read and write patterns?
- How fast / limited / costly is your storage and your RAM?
- Ultimately, the sum total of all activity and the outlook how it might develop over time is relevant.
That's why a definitive answer is hard. There are arguments to be made for either design:
- Add 3 columns to the main table.
- Add a separate 1:1 table with 3 UUIDs per row.
- Add a separate 1:n table with 1 UUIDs per row.
Assuming:
- There are always exactly 3 UUID values (as indicated by your
NOT NULL constraints).
- The main row is relatively wide.
- There are lots of updates (you mentioned 20k/day) to the main row and/or the added UUIDs, but hardly ever updates involving both at the same time.
- You always query one of the three UUID values and know exactly which column to look in. Never in all three of them at once.
- Postgres 11 or later (needed for
INCLUDE clause).
Then I would go with option 2:
CREATE TABLE table1_uuids (
id bigint PRIMARY KEY REFERENCES table1
, token1 uuid NOT NULL
, token2 uuid NOT NULL
, token3 uuid NOT NULL
, CONSTRAINT token1_uni UNIQUE (token1) INCLUDE (id)
, CONSTRAINT token2_uni UNIQUE (token2) INCLUDE (id)
, CONSTRAINT token3_uni UNIQUE (token3) INCLUDE (id)
);
Your example lookup would look like this:
SELECT *
FROM table1
WHERE id = (SELECT id FROM table1_uuids token1 = 'xyz');
Basically, the subquery resolves a given token UUID to the PK id, using one of the three indexes that are created internally to implement the UNIQUE constraints. THE INCLUDE clauses allow index-only scans.
Or like this:
SELECT t.*
FROM table1_uuids tu
JOIN table1 t USING (id)
WHERE tu.token1 = 'xyz';
Rationale
The split adds 28 bytes of overhead per row, and the size of the PK index, and some complications and costs for queries that have to join both tables or insert / update in two tables instead of just one. But it avoids table and index bloat. Makes updates faster. Less locking contention with updates on just UUIDs or just the main table. And it speeds up queries on the main table a little, that don't need the additional UUIDs (I expect, most queries don't.)
Since Postgres 11 we can conveniently add id to each UNIQUE constraint with the INCLUDE clause. This allows for very fast index-only scans, without having to add another, redundant index. Related:
One of the reasons I did not go for option 3: there is a UNIQUE constraint on each of the three UUIDs. If you put all of them in a single column, you can't make it UNIQUE. It would have to be a multicolumn constraint including a "token_type" or something. More expensive, less elegant.
The width of the main row and the frequency of updates for the main rows and each UUID column have key importance. Every update has to write a new row version in Postgres' MVCC model (TOASTed values may be excluded). Plus, except for HOT updates, a new index entry in every attached index. See:
This cost scales up with the width of the main row. If that happens a lot and/or rows are wide, your tables and indexes bloat and/or you get a lot of additional work for VACUUM. All bad for performance.
With no updates on UUID columns, option 1 would look a lot better, with a lot more updates, option 3 would.