2

I have created the following table

CONSISTENCY LOCAL_QUORUM;

drop keyspace if exists cycling;

CREATE KEYSPACE IF NOT EXISTS cycling WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 } and durable_writes = true;

CREATE TABLE IF NOT EXISTS cycling.rider ( rider_id int PRIMARY KEY, rider_name list<text>, rider_name2 frozen<list<text>> );

Do the following sequence of batch operations

CONSISTENCY LOCAL_QUORUM;
BEGIN BATCH
UPDATE cycling.rider SET rider_name = ['a2'], rider_name2 = ['b2'] WHERE rider_id = 100;
UPDATE cycling.rider SET rider_name = ['a3'], rider_name2 = ['b3'] WHERE rider_id = 100;
APPLY BATCH;

I get merged data in my list (rider_name)

select * from cycling.rider;
 rider_id | rider_name   | rider_name2
----------+--------------+-------------
      100 | ['a2', 'a3'] |      ['b3']

Do you know why?

I was expecting

 rider_id | rider_name   | rider_name2
----------+--------------+-------------
      100 | ['a3']       |      ['b3']
Erick Ramirez
  • 4,590
  • 1
  • 8
  • 30
gudge
  • 133
  • 4

2 Answers2

0

This is happening because rider_name is FROZEN, and somehow BATCH is allowing multiple values to be set when they are not FROZEN.

To test this, I created two additional tables. The rider2 table has both collections as FROZEn. The rider3 table does not use FROZEN on either:

> CREATE TABLE rider2 (    rider_id int PRIMARY KEY,    rider_name frozen<list<text>>,    rider_name2 frozen<list<text>> );
> CREATE TABLE rider3 (    rider_id int PRIMARY KEY,    rider_name list<text>,    rider_name2 list<text> );

I then executed your BATCH statement on both tables. Here are the results:

> SELECT * FROm rider2;

rider_id | rider_name | rider_name2 ----------+------------+------------- 100 | ['a3'] | ['b3']

(1 rows)

> SELECT * FROm rider3;

rider_id | rider_name | rider_name2 ----------+--------------+-------------- 100 | ['a2', 'a3'] | ['b2', 'b3']

(1 rows)

To Alex's point though, it would seem to be specific to BATCH, as running the UPDATE without the batch produces the single value.

Aaron
  • 4,420
  • 3
  • 23
  • 36
0

By design, all the statements in a CQL BATCH are executed atomically which means that they all have the same write timestamp.

When the two UPDATE statements on a [non-frozen] list column are executed in a batch, they have the same timestamp so one does not override the other. Since both updates are done concurrently, the side effect is that both "succeed" so both values are persisted to the CQL list (CASSANDRA-11368).

Frozen list collections do not suffer from this side effect since appends/prepends are not allowed so only the "last" update persists. It is important to note that rider_name2 = ['b3'] is not a guaranteed result since the statements in a CQL batch are not ordered. In a scenario where there is a race condition, it is possible to end up with rider_name2 = ['b2'] (although unlikely).

Although the behaviour of batch updates on non-frozen lists is unexpected, I personally cannot think of a situation where multiple updates to the same list column (in a single partition) is a valid use case. Having said that, I'm happy to be corrected so feel free to convince me otherwise.

In any case, I think the application should do some pre-work and coalesce the updates so only a single desired list update is executed in a batch. Cheers!

Erick Ramirez
  • 4,590
  • 1
  • 8
  • 30