0

I’m using ScyllaDB 6.1 Open Source and have a table configured to store 30 days of data with the following compaction strategy:

compaction = {
  'class': 'TimeWindowCompactionStrategy',
  'compaction_window_size': '3',
  'compaction_window_unit': 'DAYS',
  'max_threshold': '32',
  'min_threshold': '4' 
}

Observations

Pre-7 Jan Behavior: Until January 7, the table had SSTables grouped in 3-day windows, such as (Dec 23, Dec 25, Dec 28, Dec 31, Jan 3). This aligned with the expected behavior of the configured compaction strategy.

On 7 Jan: After triggering an autocompaction, a new SSTables was created for January 7 only, deviating from the 3-day window grouping behavior.

Additional Issue

Upon further investigation, I noticed that within the same 3-day window, there are multiple small SSTables instead of one large SSTable. These smaller SSTables are not being compacted into a single SSTable, even though the compaction strategy specifies min_threshold = 4 and max_threshold = 32.

Questions

  • Why did the compaction on January 7 result in a new SSTable for just that day instead of following the 3-day grouping?

  • Why are the smaller SSTables within the same 3-day window not being compacted into a single large SSTable as expected?

  • Are there specific conditions under which TimeWindowCompactionStrategy skips compaction or behaves differently for insert-only workloads?

  • Could this behavior be linked to autocompaction triggering mechanisms or internal thresholds not accounted for in the current configuration?

I’d appreciate any insights or suggestions for troubleshooting and resolving this issue.

Thank you in advance!

I configured the TimeWindowCompactionStrategy with a 3-day window, expecting SSTables to compact into larger ones within each window. However, after autocompaction on January 7, new SSTables was created for just that day, and multiple small SSTables remained instead of being compacted into a single larger one.

Erick Ramirez
  • 4,590
  • 1
  • 8
  • 30

1 Answers1

0

I'm a little confused when you stated "After triggering an autocompaction" because it suggests that you (1) triggered a compaction which (2) by definition means it's not automatic.

If you ran nodetool compact to trigger a compaction then it's expected that you'd end up with just one SSTable.

If you recall, TWCS compacts SSTables flushed from memtables during a time window into larger SSTables using STCS. At the end of the time window, TWCS compacts all those SSTables into one SSTable (see Compaction sub-properties on the Cassandra website).

The point I'm trying to make is that TWCS uses STCS under-the-hood so running nodetool compact behaves the same as a table configured with STCS -- you end up with a single SSTable.

Moving on to your other question, there's not enough detail in your post to determine why the other SSTables within the time window are not getting compacted into a single SSTable. Going back to TWCS using STCS, you need at least min_threshold similar-sized SSTables as candidates for compaction. If you ran nodetool compact which generated one large SSTable then it won't be similar in size with the other smaller SSTables so won't get picked as candidates to be compacted together.

The key thing for STCS is that the SSTables have to be similar in size which means they fall within the range of <average size x bucket_low> to <<average size x bucket_high> (bucket_low and bucket_high are configurable STCS sub-properties).

The next key point is that once an X time window has passed, TWCS won't do any further compactions on SSTables that belong to window X. It's not the same behaviour as STCS or LCS where it's looking for candidates to compact. When the window is done, it's done. TWCS moves on to the next time window.

For the record, I don't know if you actually ran nodetool compact. But if you did then it would explain your situation. Cheers!

Erick Ramirez
  • 4,590
  • 1
  • 8
  • 30