How to propagate seldomly changing data in a distributed system

Question

We store data – some kind of metadata like car manufacturers or types of computer parts – in one of our applications.
This data changes rarely. Let's say it changes once every two weeks.
This data is also relevant for other applications in other domains.
We now want to make this data available to these other applications in our distributed system.

What is the best way to make such rarely changing data available to other applications?

Is using a messaging system overkill for rarely changing data? Would a REST interface enforce too much coupling onto other applications?

score 4 · Answer 1 · edited Jan 18 '23 at 09:54

In this answer, I'm going to use SLA to mean the maximum amount of time that a downstream system can be behind the primary source.

There are basically two options here:

Downstream systems poll for data on a frequency smaller than the SLA (e.g. 2 times per SLA period)
You detect changes and alert the downstream systems

The first option is definitely the simpler one. It's also the easiest to make robust. Unless you have a really large number of downstream systems, I can't recommend doing anything else without some other mitigating circumstance that you haven't mentioned. As @DocBrown notes in the comments on @Hans' answer, you can implement a HEAD operation which provides the last time the data was updated. This will minimize the cost of these polling calls. The clients then, just need to keep track of which update they have successfully processed. It's important that you don't change the last update date-time until you have fully processed it. This will make your updates more robust. Downstream applications should also have account for being repeatedly unable to handle a given update. I recommend at least 2 checks in each SLA time period. If you only check once, even a simple networking hiccup could result in missing the SLA.

If the volume of these checks is too high (unlikely) then the only thing that really changes is that you need to notify clients. There are many options, they aren't that hard to implement if you've done it before. The trick isn't getting them to work, it's making them robust to errors. The most likely failure mode is that they stop receiving updates. I'm going to guess that you don't need this and if you decide you do, the details are outside the scope of this question.

Hans-Martin Mosner · Answer 2 · 2023-01-17T16:04:29.017

For low-frequency updates a message broker indeed feels like overkill.

You can emulate a publish/subscribe behavior on top of REST endpoints:

The information provider offers the data through some ordinary endpoint. Consumers can use that endpoint at any time to get the data, and if they don't need to get timely updates that's all they might want to do.
The information provider has an endpoint /subscriptions where information consumers can register and deregister interest using POST and DELETE actions. A subscription request would include the URL of an endpoint belonging to the consumer which is notified when new data arrives.
Consumers will be notified about new data through the endpoint they registered with the provider. They can then use the provider's data endpoint to access the current data.

By using a /subscriptions endpoint, you reduce coupling between producer and consumers, as the producer does not need to know in advance who their consumers will be. Consumers need to know about the producer anyway, so requiring them to subscribe seems reasonable.

Of course, the management of subscriptions introduces some additional overhead, including the handling of dysfunctional subscribers (should notifications be buffered? for how long? should alerts be sent? etc.). You also open some possible attack surfaces if these interfaces aren't entirely within your organization, so you might need authentication checks and DoS protection etc, which you probably have anyway but which make the system bulkier.

How to propagate seldomly changing data in a distributed system

2 Answers2