How do I restore Cassandra snapshots to another cluster with identical configuration?

Question

This is a common question from Cassandra operators who want to migrate data from one cluster to another.

What would be the procedure for restoring snapshots of a Cassandra cluster to another cluster which has identical configuration?

Erick Ramirez · Accepted Answer · 2022-09-06T13:26:50.727

Restoring snapshots to another cluster or "migrating" data is otherwise known as data cloning. The goal is to copy (clone) application data to another cluster, usually for the purposes of testing or development.

Definition

Two clusters (i.e. source and destination) are considered to have identical configuration when:

the cluster topologies are identical - same number of DCs, same number of nodes in each DC
the token assignments are identical - the assigned tokens for nodes in the source cluster are exactly the same as the assigned tokens for nodes in the destination cluster

If the conditions above are true, use the "refresh method" where SSTables from the nodes in the source cluster are copied to the corresponding nodes in the target cluster then "loaded" by running nodetool refresh.

Example Topology

Here is an example of a source and a destination cluster with 3 nodes in each of the 2 DCs:

Source	Destination
AlphaDC	CharlieDC
`nodeA`: tokenA1, tokenA2, tokenA3	`nodeU`: tokenA1, tokenA2, tokenA3
`nodeB`: tokenB1, tokenB2, tokenB3	`nodeV`: tokenB1, tokenB2, tokenB3
`nodeC`: tokenC1, tokenC2, tokenC3	`nodeW`: tokenC1, tokenC2, tokenC3
BetaDC	DeltaDC
`nodeD`: tokenD1, tokenD2, tokenD3	`nodeX`: tokenD1, tokenD2, tokenD3
`nodeE`: tokenE1, tokenE2, tokenE3	`nodeY`: tokenE1, tokenE2, tokenE3
`nodeF`: tokenF1, tokenF2, tokenF3	`nodeZ`: tokenF1, tokenF2, tokenF3

In this example, nodeA (from the source cluster) and nodeU (on the destination cluster) have the same tokens assigned, nodeB and nodeV have the same tokens, and so on.

The "refresh" method

Follow this procedure to clone application data from a source cluster to the destination cluster in the example.

IMPORTANT: Create the application keyspace on the destination cluster with the same replication settings as the source cluster. For example, if the source keyspace is configured with a replication factor of 3 then the it needs to have RF=3 in the destination cluster as well.

WARNING: Do NOT clone system keyspaces/tables because the data contained in them are specific to the node/cluster they belong to. Only clone application data.

STEP 1 - For the first table, create the schema on the destination cluster.

STEP 2 - Take the snapshot of the first table on nodeA and copy it to the corresponding table directory on nodeU. Note that the suffix of table directories are UUID timestamps and will be different between clusters since they are based on when the table was created.

STEP 3 - On nodeU, force Cassandra to read and load the new SSTables from the disk with:

$ nodetool refresh -- ks_name table_name

STEP 4 - Check the logs on the node to verify that the new SSTables were opened.

STEP 5 - Repeat the steps above on the next table and keep repeating until all the tables have been migrated.

STEP 6 - Repeat the steps above on by migrating the snapshot from nodeB to nodeV. Keep repeating until all nodes in the second cluster have the snapshots restored.

Things to know

This is an online procedure. It does not require the destination nodes to be shutdown.
This procedure only applies to clusters with identical topologies and token assignments.
It will not work for non-identical clusters because the partitions in the source SSTables will not necessarily fall in the token range(s) owned by the destination nodes.
There is no need to use OpsCenter cloning or sstableloader for this scenario.

For instructions on cloning data to a new cluster, see How do I migrate data in tables to a new Cassandra cluster?.

How do I restore Cassandra snapshots to another cluster with identical configuration?

1 Answers1

Definition

Example Topology

The "refresh" method

Things to know

Linked