Implementating user notifications list using Aerospike

Question

I need to choose the right DB for a notifications system that needs to handle billions of notifications. The record structure is as follows:

[user_id, item_type, item_id, created_at, other_data]

The inserts are going to be in bulks up to hundreds of thousands at spikes. And it needs to support thousands of selects per minute of this kind:

select * from user_notifications where user_id=12345 order by created_at limit 10
select * from user_notifications where user_id=12345 and item_type='comment' order by created_at limit 10
--- and for the pagination next page:
select * from user_notifications where user_id=12345 and item_type='comment' and created_at>'2020-11-01 10:50' order by created_at limit 10

It should also allow quick updates and deletes and ideally have TTL on each record. Right now it's implemented using MySQL, we only have 400M rows and it's already slow as hell. And bulk cleanup is just impossible.

Initially, I thought ScyllaDB/Cassandra is ideal for that. If I set the primary key to be [user_id, item_type, item_id] (user_id being the partition key) for inserts/updates/deletes and [user_id, item_type, created_at] as secondary index. CQL seems straightforward in this case and it should work fast (correct me if I'm wrong). The problem is that we are Ruby-On-Rails based and there's no good ruby client library for that. The one listed in the ScyllaDB clients' list (https://github.com/datastax/ruby-driver) is in maintenance mode and I'm not sure it will be updated with new Ruby versions, etc.

Recently, I heard about Aerospike and their benchmarks look really cool, but I couldn't figure out how to implement the above requirement using Aerospike's architecture. Especially as their secondary index seems to be always in memory which makes it impossible to index billions of rows.

This notifications schema seems to me like something very common, but still, I couldn't find a good article describing all the ideal ways to implement it. Any suggestions are welcome.

Thanks

Ronen Botzer · Answer 1 · 2020-12-03T18:44:04.280

Since ScyllaGreg is putting in the good word for Scylla, I thought I'd represent for Aerospike, where I work. Of all the C* variants I like ScyllaDB the best, but I believe that in the case you describe, and in general for row-oriented situations, Aerospike Database will perform better than ScyllaDB using less hardware. I've written a few Medium articles (@rbotzer) explaining why, based on realistic use cases.

Modeling

Here's one possible way to model this in Aerospike.

I am assuming that you always want to access the notifications of a specific user. For that reason, the key for a record (a row in Aerospike-speak) would be the user_id. The rest of the data fits in a single bin containing the following key-ordered Map structure:

{ epoch: [ item_type, item_id, other_data ] }

Where epoch is a way to simplify the created_at datetime into an integer representing minutes, seconds or milliseconds since an arbitrary epoch, such as the date your app went live (ex: minutes since 2020-10-01 00:00). You can choose the resolution that makes sense to you in order to avoid an overwrite. Either way, Aerospike uses MessagePack to serialize this map data, and MessagePack has the nice side effect of rightsizing numeric values into a minimal storage size.

Querying

Now to query this data we'll use the Map operations API:

Adding any information is a simple Map put or put_items, which upsert the data into the Map.
To get all the notifications for a specific user, you just get() the record by user_id.
To paginate through the notifications for a specific user you use Map get_by_index_range(KeyValue, i, 10) with i = 0, 10, 20, ...
To get all the notifications between two points in time (October 2nd) you would use Map get_by_key_interval(KeyValue, 1440, 2880)
To get all the notifications of type 'comment' you would use Map get_all_by_value(KeyValue, ['comment', *])
To clear out all the notifications in October 2020 you would use Map remove_by_key_interval(Count, 0, 44639)

You can see that there's currently no way to AND a query for a date range and a query for item type. However, Aerospike is extremely fast, with the data being retrieved after looking up a single key and operating on the record data, which is stored contiguously. You can easily get slightly more data (everything in a date range) and filter for the item_type on the application side.

Notice that this modeling approach requires zero secondary indexes. It just uses the power of Aerospike's even data distribution, extremely fast primary index lookups, and single IO record reads from storage.

Future for Querying

Aerospike Database 5.2 added Expressions, which can already do things like chain a Map get_by_index_range => get_all_by_value => get_by_index_range, but only in the context of a filter. This means that you can use a complex expression as a conditional for whether to apply an operation to a query, scan, batch read or single record operation.

In the near future you'll be able to apply an expression to the operation, which will solve the AND limitation. If you wanted to find all the notifications in a time range that are of a specific type and also paginate through them, you would

exp(get_by_key_interval(KeyValue, 1440, 2880), get_all_by_value(KeyValue, ['comment', *]), get_by_index_range(KeyValue, 0, 10))

Alternative Modeling

As I described in Aerospike Modeling: IoT Sensors, you can choose to partition the user notifications by day, with the key being userID:YYYYMMDD.

This would allow you to keep the size of the records reasonable if you expect too many notifications. It also makes it convenient to purge data past a certain age with deletes.

However, I suspect you will not run into this problem. Let's assume a notification is 40 bytes, then 256 notifications take at most 10KiB. In reality those are probably smaller, and as I mention in the article, MessagePack will rightsize some of those data types. If you use (the Aerospike EE feature) compression it'll be smaller still. Even if it is 40 bytes, the default 1MiB write block can hold 26,000 notifications for a single user, and you can trim those daily with a remove_by_value_range.

Another nice thing about partitioning by days is that only one record is ever written to ('today').

Now to address your comment about secondary indexes in Aerospike, it's true that they're currently stored in memory, however the memory cost isn't as bad as you expect.

First, Aerospike doesn't buffer data in memory like all C* variants. In the classic Aerospike Database deployment you store all the data on SSD, and only the indexes consume memory. If you read my Medium posts, I go into why Aerospike is faster than ScyllaDB at retrieving a random record out of an arbitrarily large stack of records.

Every record has a 64 byte entry in the primary index. As for secondary indexes, the absolute worst case if you have a single key per distinct value indexed (and you should never use a secondary index in such a situation) would cost you 85 extra bytes.

A quick calculation for 1 billion records is 10^9 * (64+85) = 139GiB. If you have a 3 node Aerospike cluster the memory cost is 46GiB per node. Considering that you don't need memory for the data, and there's very little overhead on top of that, you end up with a quite affordable RAM consumption that fits on a small cluster.

Going forward, quite a bit of change will be coming to secondary indexes in Aerospike, however I still suggest modeling in a way similar to described above. Using Map and List operations on a single record will be faster and consume less resources than a query using secondary indexes in a C* database.

score 0 · Answer 2 · answered Dec 02 '20 at 23:52

0

Full disclosure - I work on ScyllaDB, so am honestly biased toward software that I think is pretty awesome.

If "ScyllaDB/Cassandra is ideal for that," why not just use Scylla?

answered Dec 02 '20 at 23:52

Greg

101
1

Implementating user notifications list using Aerospike

2 Answers2