What are the pros and cons of using a reference/pointer vs an ID

Question

I'm writing in C++, but this problem applies to any language without GC and even to languages with a GC as well.

I have a structure in memory in which I create/add objects. The structure takes ownership of those objects. I should never need to use an object after it's removed from the structure.

When I first implemented this data structure, it seemed natural to use an ID/key/name/handle for the objects stored in it. I'm using it like this:

id1 = structure.addObj(new Square());
id2 = structure.addObj(new Square());
id3 = structure.addObj(new Circle());
obj3 = structure.getObj(id3);
obj3.addFriend( id1 );
obj3.addFriend( id2 );
idMax = structure.findObjWithMostFriends();
objMax = structure.getObj(idMax);
print(objMax.name);

After using it for a while, I'm thinking that it would be better to forget about the IDs and always use references to the objects instead. This way I wouldn't need to pass a reference to the structure around every time.

I'm thinking about refactoring everything to only use references but I'm afraid of regretting it. I'd like to know more about what are the pros and cons of using IDs to decide whether I should proceed.

Memory details:

The objects are allocated on the heap and their address never changes.

The structure deallocates those objects when they're removed (they could be released to the caller instead, but I don't need this at the moment).

I'm not supposed to ever use objects that don't belong to the structure. If my program is correct, I should never end up with a dangling ID or pointer. But it could happen if the program has bugs.

What are your experiences switching from IDs to references for similar problems? Which solution should I use?

score 5 · Answer 1 · answered Aug 23 '20 at 15:57

IDs or handles are generally preferable in the following cases:

memory locations might not be fix, e.g. when pointing into C++ standard library containers or when transferring objects between processes
you know you'll have few IDs compared to the pointer's address space (can significantly reduce memory requirements)
objects are managed via reference counting and the object graph might have cycles
you need indirection but the language doesn't support first-class pointers (e.g. Python, Java)
you are concerned about object lifetimes, e.g. deterministic deallocation of the entire object graph or use-after-free vulnerabilities

The point with lifetimes is important. In C/C++ it is your responsibility to know whether a pointed-to object is still alive so that you're allowed to dereference the pointer. There are two strategies to address this: use reference counting or GC to keep the object alive as long as you have a pointer, or carefully think about lifetimes like the Rust compiler does (which incidentally requires the use of IDs for complex object graphs).

IDs are a partial solution to the lifetime problem because an ID alone cannot be dereferenced, but needs some context that contains the actual object graph. The lifetime of this context is usually easier to reason about, especially when the context is represented by a stack-allocated object and never directly referenced from heap-allocated objects.

But this is not airtight, e.g. you might dereference an ID in the wrong context. Again, there are two approaches: you can expect that ID resolution might fail and therefore return a nullable pointer from the resolution function, or you try to detect this error. Detection can be made more likely by assigning a short ID to each context and encoding it into each object ID/handle.

I'm currently thinking about moving a smart-pointer-based system to an ID-based system because this makes richer queries through the object graph more feasible and can do away with reference counting overhead. However, potential reuse of IDs could lead to hard to detect bugs (also a kind of use-after-free).

score 2 · Accepted Answer · answered Aug 24 '20 at 08:09

The pros and cons of pointer vs ids depend on the context in which they are used. A general recommendation it therefore not possible.

Typically, ids make sense if they are associated with a "container", such as a repository, or an object that acts as an owning aggregate. In this case, the id allows to abstract from the memory layout, define more encapsulated interfaces, and serialize easily the container. In the context of graphs, a whole graph would be good candidate for being the "container" for its nodes and its edges.

However, if you start to use ids globally, by assuming a gloabl default container, you are building code which will be tightly coupled to the underlying global structure, difficult to reuse, and I suspect, difficult to maintain in the long run. The first impact of this problem is the kind of mixed API you are using:

you have to use the reference of an object to change it, but you have to use ids to create them or pass them as arguments.
you leak the reference associated to an id, so that you are bound to the memory layout you want to abstract from, and the context may keep track of the reference which forbids any moving of the object (and the risk of dangling pointer in case of bugs).
if you'd make some temporary copy of an object, it would end-up in the global structure.

All this seems to me very error-prone as it is. Especially if you add the risk of confusion between references and ids when declaring variables using the modern auto style.

Conclusion: Either reengineer completely your API to make it fully id-based, and making explicit the container (i.e. your currently global structure), or switch to a consistent shared_ptr-based API instead of ids, the id being then only an element that could help to find the share_ptr when it is not known (and for serializing your data).

gnasher729 · Answer 3 · 2020-08-25T12:47:48.960

0

Ids work very well for persistence. So you store an id in the database, and you have an API for giving you the object given an id. Very much preferable if it is the same object for multiple calls. You mostly use this when reading items from persistent storage.

Once in memory, using a reference counted pointer (shared pointer in C++) is a lot easier and a lot more efficient.

edited Aug 25 '20 at 12:47

answered Aug 24 '20 at 11:26

gnasher729

49,096

What are the pros and cons of using a reference/pointer vs an ID

3 Answers3