2

I was planning to change my column from nonfrozen to frozen. I would like to know if there are any guidelines for updating the frozen column with respect to tombstone generation. In some blogs it is written that updating a frozen column generates the row tombstone. Is it correct?

I would like to understand the concept of tombstone generation with respect to frozen column update.

1 Answers1

4

I'm assuming that you mean you intend to "freeze" a collection because you can't freeze a column. In any case, updating a frozen collection does not generate a tombstone.

Setting the value of a non-frozen collection generates a tombstone because to completely clear or erase the previous values of the collection. Cassandra does not do a read-before-write (except in the case of lightweight transactions) so it does not know if there are cells which hold elements of the collection therefore it needs to write a tombstone so that any older cells are not returned by a read request.

In contrast, frozen collections are serialised into a single value instead of individual elements so the entire value is updated. Since the value of a frozen collection is stored in one cell, a tombstone is not required to invalidate older pre-existing cells.

You would have been able to work these out yourself if you ran a quick test of your own. For completeness, I will illustrate with this example table that contains both a frozen set and a regular set collection:

CREATE TABLE community.freeze_test (
    pkey int PRIMARY KEY,
    frozenset frozen<set<text>>,
    setcol set<text>
)

First, we create a new partition with:

INSERT INTO freeze_test (pkey, frozenset, setcol)
  VALUES (1, {'apple', 'banana'}, {'avocado', 'blueberries'})

Then flush the memtable to disk with nodetool flush, dump the contents of the SSTable:

$ sstabledump nb-1-big-Data.db
[
  {
    "partition" : {
      "key" : [ "1" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 18,
        "liveness_info" : { "tstamp" : "2024-07-05T06:58:41.192323Z" },
        "cells" : [
          { "name" : "frozenset", "value" : ["apple", "banana"] },
          { "name" : "setcol", "deletion_info" : { "marked_deleted" : "2024-07-05T06:58:41.192322Z", "local_delete_time" : "2024-07-05T06:58:41Z" } },
          { "name" : "setcol", "path" : [ "avocado" ], "value" : "" },
          { "name" : "setcol", "path" : [ "blueberries" ], "value" : "" }
        ]
      }
    ]
  }
]%

Notice that frozenset is just a single cell value:

          { "name" : "frozenset", "value" : ["apple", "banana"] }

but the non-frozen collection has a tombstone marker and each set element spans multiple cells:

          { "name" : "setcol", "deletion_info" : { "marked_deleted" : "2024-07-05T06:58:41.192322Z", "local_delete_time" : "2024-07-05T06:58:41Z" } },
          { "name" : "setcol", "path" : [ "avocado" ], "value" : "" },
          { "name" : "setcol", "path" : [ "blueberries" ], "value" : "" }

If we set frozenset using UPDATE, it overwrites the cell with a new value and no tombstone:

UPDATE freeze_test SET frozenset = {'oranges'} WHERE pkey = 1;
        "cells" : [
          { "name" : "frozenset", "value" : ["oranges"], "tstamp" : "2024-07-05T07:02:11.064353Z" }
        ]

If we update the non-frozen collection with:

UPDATE freeze_test SET setcol = {'grapes','strawberries'} WHERE pkey = 1;

notice that a tombstone is also generated along with the new cells:

        "cells" : [
          { "name" : "setcol", "deletion_info" : { "marked_deleted" : "2024-07-05T07:03:42.962302Z", "local_delete_time" : "2024-07-05T07:03:42Z" } },
          { "name" : "setcol", "path" : [ "grapes" ], "value" : "", "tstamp" : "2024-07-05T07:03:42.962303Z" },
          { "name" : "setcol", "path" : [ "strawberries" ], "value" : "", "tstamp" : "2024-07-05T07:03:42.962303Z" }
        ]

Finally, if I add an element to the non-frozen collection with the += operator:

UPDATE freeze_test SET setcol += {'mango'} WHERE pkey = 1;

The SSTable just has one cell:

        "cells" : [
          { "name" : "setcol", "path" : [ "mango" ], "value" : "", "tstamp" : "2024-07-05T07:07:41.407898Z" }
        ]

Adding or removing an element from a non-frozen set collection does not generate a tombstone because Cassandra does not need to clear out the existing contents of the collection. Cheers!

Erick Ramirez
  • 4,590
  • 1
  • 8
  • 30