Protecting against malicious duplicate IDs in a distributed environment

Question

Let's say we have multiple (somewhat autonomous) (micro-)services, and when entities are created, the ID (UUIDs or whatever) can be set externally. How can we ensure that an ID remains unique across all services while protecting against someone maliciously trying to create ID collisions in separate services? While preserving the concept of externally generated IDs and not sacrificing to much performance or be overly verbose/leaky/complex.

The only solutions I've come up with so far are:

Centralized ID checks (at least if they are generated externally): This approach, however, would break the decentralized nature of the system and add a single point of failure. The simplest way to implement this might be a shared database table, where services can reserve IDs.
Prefixing/namespacing IDs: This approach would be somewhat verbose and may leak information. This would allow a service to ensure uniqueness within its own namespace, for instance through a DB unique index. Which may require an additional table storing all reserved IDs within a service.

This question is not about how to safely generate unique IDs in a distributed System, outside of malicious intent. The answer to that could be using UUIDv4.

The Question is also not how to prevent ID collisions / inconsistencies within a service. The answer to that might be Unique Indexes when using an SQL DB. However, the same problem and solutions may apply within a service, across multiple tables or in high performance scenarios maybe even within a table.

Lastly a workaround might be to not allow IDs to be set externally, or only allowing to set intermediary/reference IDs that may be translated/mapped to actually unique IDs and returned to the client. But it is not my immediate goal to find a workaround breaking the initial design.

Thank you!

candied_orange · Accepted Answer · 2024-04-24T15:04:59.630

There is no such thing as uniqueness without context. You may think your 42 is special but I can assure you other people have used 42 before.

This may lead you to thinking a large UUID or a centralized context (1) is required but you can have distributed contexts as you mention in (2). However, making each context set a self identifying prefix is only one way to do that. Let me ask, do you know who is talking to you?

The prefix only needs to be sent to you when you don't. And if you don't you're trusting the prefix to not be a lie. And weren't we trying to stop trusting these numbers?

An alternative way to do (2) is use the fact that you know who is talking to you and put the prefix on it yourself. Now you can trust it.

Well, to some extent. This puts each context in a sandbox. Now the only IDs they can collide with are their own.

score 3 · Answer 2 · answered Apr 24 '24 at 12:28

Fundamentally, a central registry is required for uniqueness.

The system of assigning a unique prefix to different allocators, still has the centralised element of assigning the unique prefix itself to each allocator.

Generalised, the prefix method is basically a pattern whereby an allocator reserves a block from the central registry, then proceeds to dole out individual items from that pre-allocated block. The only question is how often coordination needs to occur with the central registry.

So the strict answer to how you prevent malicious duplicates in a truly distributed environment is: you don't.

Instead, the question should be how often the central registry is consulted in your design, and what latency can be tolerated by that design (if the central registry is slow to respond, or not always available).

score 1 · Answer 3 · answered Apr 24 '24 at 13:38

You need to specify more information about the distributed service. Most systems have some central components.

For example, you could shard on the ID. so when an ID comes in you takes it hash to find out which shard is going to deal with it.

This will ensure that you still hit the duplicate key error. Even though you don't have a central index the same ID will got the same node and a duplicate error will be raised.

Maybe you can make it so duplicates don't matter, like if i have a ObjectA with id 1 and an ObjectB with id 1, if i always know if I want A or B the collision isn't critical.

Maybe you can do eventual consistency audits to catch the duplicates after the fact. If you are just worried about malicious creation this might be enough.

score 0 · Answer 4 · answered Apr 24 '24 at 14:04

Well quick answer is "you cant". If you have no control over what happens, there might be malitious ID and there is no way to prevent it.

The most minimalistic and good-performing solution I can think of is to have central service that generates IDs. However instead of having database and remember which IDs are used and which not, it will generate random ID (such as UUID which has almost zero chance of duplication), put it inside JWT token and then JWT token must be used when sending the object for persistance. You can check that ID sent in resource and in JWT is the same (+obviously you will check its valid JWT created by your service)

However I would not personally abuse the system that much. If you need unique IDs, it means its important to create resources somewhere eventually with these unique IDs. The clients application should be able to handle some kind of objects without IDs (or with some their own internal IDs) and assign the "real ID" when the resource is created in some core component.

Protecting against malicious duplicate IDs in a distributed environment

4 Answers4