Separate small column updated a lot from large jsonb columns to optimize performance?

Question

A table has the following structure:

a: id
b: large jsonb
c: large jsonb
d: timestamp

Looking at stats in 2 weeks, see many updates:

90k requests updates the d field alone (total time: 25min measured from app)
90k requests updates the "large jsonb" fields (total time: 6h30 measured from app)

Since Postgres follows MVCC pattern which rewrites rows on update, is it interesting to change the table structure to have d in a separate table?

score 3 · Accepted Answer · answered Sep 29 '23 at 08:56

No, that is not necessary. The large JSON values seem to be stored out of line in a TOAST table. Then the update of d only has to write a new row to the main table, but the TOAST values stay the same.

If the JSON values were small enough to be stored in the main table, modifying d would be much more expensive, because it had to write the whole row with the JSONs in it again. Then it might make sense to consider such a separation.

Erwin Brandstetter · Answer 2 · 2024-11-09T01:14:45.750

Like Laurenz advised, column values that are stored out of line in a TOAST table and are not touched in the UPDATE can stay as they are, saving work and storage. (Old and new row version of the main relation point to the same TOAST entry.)

But there is fine print to this.

Are you toasting, yet?

The row size has to exceed TOAST_TUPLE_THRESHOLD (typically 2 kb) to be considered. Some columns may already be compressed. The TOAST mechanism will then try to compress more columns until the size falls below TOAST_TUPLE_TARGET (typically the same 2 kb). Only if it fails to achieve that, it will start moving data to the TOAST table, one column at a time.

The default compression algorithm pglz compresses jsonb data around factor 3 to 4 in my quick test. (Varies depending on content.) So the raw size of your row has to exceed around 7 kb, before anything is TOASTed. And then possibly only one of your two "large" columns. That may leave you with medium size tuples in the main relation of up to 2 kb, which brings us back to your initial considerations.

Have a closer look at the compressed(!) size of your jsonb values, and the actual tuple size in the main relation (min, max, avg, mean) to arrive at an educated decision.

Here is a blog post with useful queries:

https://jongsma.wordpress.com/2022/06/17/postgresql-toast-tables-and-referencing-rows/

"Unchanged"

About:

90k requests updates the "large jsonb" fields ...

Commonly, an update only actually changes one or the other column, or updates one or the other without actual changes. TOASTed column values stay the same if the column is "unchanged" - i.e. not targeted in the UPDATE. If you overwrite a column with an unchanged value, that counts as "changed". Postgres won't verify that the value actually changed and write a new row version, including TOAST. Avoid such "empty" updates with smart query design. See:

Compression algorithm `lz4`

If what you disclosed is all that happens in your table, consider the non-default LZ4 compression algorithm. (Available since Postgres 14.) It's much faster, while (typically) compressing a little less. (Resulting size was ~ 15 % bigger in my quick test for jsonb data.) Here is blog post with an exhaustive comparison from when it was first released:

https://www.postgresql.fastware.com/blog/what-is-the-new-lz4-toast-compression-in-postgresql-14

See:

In summary, use the appropriate compression algorithm, then use smart queries, then gather valid statistics on your data, and then decide whether breaking out one or the other column can make sense. There has to be a clear benefit to justify the overhead.

Separate small column updated a lot from large jsonb columns to optimize performance?

2 Answers2

Are you toasting, yet?

"Unchanged"

Compression algorithm lz4

Compression algorithm `lz4`