Data inconsistent after restoring database using nodetool refresh in K8s Cassandra

Question

ENV Details:

GKE (Google Kubernetes cluster)
Snitch Google cloud snitch. https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/architecture/archSnitchGoogle.html
140 nodes with 4 RACKs (4 statefulsets in 4 AZ's in one region)
Each node size is 5TB and cluster size is 700TB.

Followed Procedure to restore DB: I restored a database size 450TB in K8s cassandra from source to target using nodetool refresh by copying sstables. Each node has around 5 to 6 TB data. The approach I have taken:

Build a new cluster with same number of nodes like source 140 nodes.
Created a new disks from source disk snapshots(cloud) and attached those disks to new nodes.
Cleared system tables.
Started C* as a Target cluster.
Created tables manually which are matching with source cluster
Copied stables from old to newly created(UUID) tables. 7 Ran nodetool refresh

Observation: Nodetool status showing same size as source for each node, cfstats also matching the tables size on both target and source.

Post migration issue:

To reduce cluster size I started decommissioning one of the node and it completed in 10 minutes. Expecting to see atleast 8 hours based on previous source cluster maintenance.
Since decommission was not expected to run only for 10 minutes I tried adding a new node to cluster which suppose to copy data approximately 5-6 TB but it only copied 2 TB.
As an another check I just tried "Nodetool cleanup" on another node which is part of the cluster in target and it reduced data from 6 TB to 2 TB. Same cleanup dint reduce any data on source node.
After restore whole cluster is repaired on target using reaper tool and no issues with repair but decommission, cleanup and bootstrap are the problem.

score 0 · Accepted Answer · answered Aug 31 '23 at 03:41

I suspect that the underlying issue is that your procedure is incorrect. For the "refresh method" to work, the source node(s) and corresponding target node(s) must have identical configuration down to the token(s) assigned to them as I've explained in How do I restore Cassandra snapshots to another cluster with identical configuration?.

When you copy SSTables to a node that has a different token ownership to the source node, the partitions in those SSTables which don't belong to the target node get thrown away when you run nodetool cleanup.

Since the nodes are deployed in a Kubernetes cluster, I assume that you don't have control over the token assignments which means that the pods would have been assigned random tokens.

In a situation where the configuration is not identical between the source and target clusters, you cannot use the refresh method. The correct procedure is to load the data using sstableloader.

If you're interested, I have documented the detailed steps in How do I migrate data in tables to a new Cassandra cluster?. Cheers!

Data inconsistent after restoring database using nodetool refresh in K8s Cassandra

1 Answers1