53

I’m finding it hard to avoid data duplication or a shared database for even the simplest microservices design, which makes me think I’m missing something. Here’s a basic example of the problem I’m facing. Assuming someone is using a web application to manage an inventory they would need two services; one for the inventory managing the items and the quantity in stock and a users service that would manage the users data. If we want an audit of who stocked the database we could add the users ID to the database for the inventory service as a last stocked by value.

Using the application we may want to see all the items that are running low, and a list of who stocked them last time so we can ask them to restock it again. Using the architecture described above, a request would be made to the inventory service to retrieve the item details of all items where the quantity is less than 5. This would return a list including the user IDs. Then a separate request would be made to the users service to get the user name and contact details for the list of user IDs obtained from the inventory service.

This seems awfully inefficient and it doesn’t take many more services before we’re making multiple requests to different services APIs which in turn are making multiple database queries. An alternative is to replicate the users details in the inventory data. When a user changes their contact details we would then need to replicate the change through all other services. But this doesn’t seem to fit with the bounded context idea of microservices. We could also use a single database and share this between different services, and have all the problems of an integration database.

What’s the correct/best way to implement this?

5 Answers5

32

I’m finding it hard to avoid data duplication....

According to the Microsoft ebook on microservice architecture, there is nothing wrong with data duplication. Basically, duplicating data increases the decoupling between the services and therefore strengthens their roles as a single authority. A relevant passage:

And finally (and this is where most of the issues arise when building microservices), if your initial microservice needs data that's originally owned by other microservices, do not rely on making synchronous requests for that data. Instead, replicate or propagate that data (only the attributes you need) into the initial service's database by using eventual consistency (typically by using integration events...

19

I completely missed where you're being required to duplicate.

A central principle of micro services is for the service to be the single authority. That means inventory and user management can be completely separate. I'd design the user management so that it doesn't even know the inventory system exists.

But I'd design the inventory system so that it never stores anything about users other then a user ID. That takes care of your problem of propagating user info changes.

As for things that need both inventory info and user info such as logs, audits, and print outs they don't get updated as info changes. They are a record of what was. Again, you don't propagate change.

So in every case, when you want the latest user info you ask the user info service.

candied_orange
  • 119,268
6

a request would be made to the inventory service to retrieve the item details of all items where the quantity is less than 5. This would return a list including the user IDs. Then a separate request would be made to the users service to get the user name and contact details for the list of user IDs obtained from the inventory service.

Indeed, yes.

Granted, in a monolith you could have an Inventory-model that you query for the relevant items, feed that into a User-model and get the same data.

Or you could take it further, if you have them in the same relational database and write SQL that and the database will take the inventory-table and user-table, it does some magic, and you get the data you are after.

Regardless of how you do it, somewhere there will be code that essentially fetches a list of user ids from the inventory system, feeds them into the user system and compiles a list of data.

The question you need to answer is about performance and maintenance and other "soft" qualities.

The main benefit of microservices is scaling. If you have a ten thousand users on one machine and it is a bit sluggish, you can add another machine and the system becomes twice as fast. Add eight more and it's ten times as fast. (Linear scaling is probably optimistic, but it is the ideal and not that unreasonable to hope for.)

And this is per service. If the inventory system is the bottleneck, it is used for more than reports about users, you can add more machines to just that service. The machines can also be specialised; this service needs a lot of memory, that service does heavy calculations and needs more cpu.

If you don't need the scaling, there is one other benefit of microservices: they are modular. Of course, monolithic apps can also be modular, and you have a normalised database and... but in practice the walls between modules are like glass walls in the best case, and lines in the sand in the worst. Microservices are separated by solid steel.

If your user system literally catches fire, that wont affect your inventory system in the slightest. You wont be able to print pretty reports about who stocked what, but customers will be able to place orders safe in the knowledge that the stocked items are there.

And you don't duplicate data in microservices, any more than you do in a relational database(*). In a relational database you can do a join, and the equivalent is to merge the lists in code like described.

You could also add a view, the equivalent is to add a new service that does the merge for you; that would result in three requests; one to the new service and then that service does the original two. Relational databases have fancy stuff that optimises views, that has to be implemented on the service level. You don't get it "for free".

Caching is different from data duplication in that if two values mismatch you know which one is wrong. It is often used in microservices to bring availability up at the expense of consistency (CAP theorem). Since relational databases completely butcher availability on the altar of consistency it is less common in them. I'd say there is nothing inherent about microservices that makes caching easier, but in practice caching is a primary concern and that makes caching easier in microservices.

(*) If it makes sense to duplicate data in a microservice swarm then it probably would make sense in the equivalent relational database to.

Odalrick
  • 406
0

I think, inventory service do not need all the user infos, as inventory service needs user data, it should consume the events(create,update,delete) from the user service and maintain only required user data int it's own user database. In that way there will be data duplication however, your services won't be tightly coupled.enter image description here

0

This is indeed super inefficient.

Therefore splitting up your monolith into microservices shouldn’t be taken lightly.

It is always going to be a trade-off so you have to make sure the trade-off is worth it.

With a very large project of 8 years I’ve found myself multiple times splitting off a microservice and then figure out later I should actually merge them with other microservices for better maintainance and performance.

More microservices definitly doesn’t mean by definition that it will be more easily to scale or maintain.

Dirk Boer
  • 454
  • 3
  • 9