Confusion about the meaning of the word aggregate in domain driven design

Question

In a discussion about domain driven design I have learned the different people seem to think of different things when using the word aggregate. The main difficulty is that some people use the word aggregate for what other people call aggregate type.

It is quite difficult to have a discussion if people assume different meaning for the same words. For this reason I set out trying to clarify on what most people and the literature agrees too. If you give an answer to this question I would be very happy if you could provide a reference to literature.

For one person an aggregate is the boundary that groups a collection of entities. It is more a conceptional clustering boundary.

For another person an aggregate is a collection of entities transfered from a database repository (having transitional consistency). So an aggregate is something real and not just a concept. If I for example load two users from a database then I have loaded two aggregates of the same aggregate type.

Another person that also thinks that a collection of entities that are transactional consent but thinks that if you load data of a given aggregate type you can also load it partially (with some data just null for example) and still call the whole thing one aggregate while others would see this as two aggregates (with eventual consistency, meaning the consistency is given after both aggregates are saved).

To find the true meaning of the word aggregate myself I have had a look at the definition of Martin Fowler. Here an aggregate is something real and there can be two aggregates of the same aggregate type. But when reading something like this article from Vaughn Vernon I get the impression that he calls aggregate what according to the 'Martin Folwer like interpreted understanding' should be called aggregate type.

score 8 · Accepted Answer · answered Dec 17 '15 at 18:31

For terminology in Domain Driven Design, start from "the blue book" -- Domain Driven Design by Eric Evans.

AGGREGATE A cluster of associated objects that are treated as a unit for the purpose of data changes. External references are restricted to one member of the aggregate, designated as the root. A set of consistency rules applies within the aggregate's boundaries.

That last sentence, I think you can turn around -- the boundaries of the aggregate are defined by the consistency rules.

It's definitely the case that an aggregate has state. Each time the domain model changes, an aggregate is taken from one consistent state to another. The data that we persist is used to reconstruct this state. So in that sense, it is a real thing.

But the aggregate itself doesn't necessarily have a word in the ubiquitous language. It's a derived concept.

Broadly, we could put the entire domain model under a single aggregate, that enforces all of the consistency rules. We don't, because it that design doesn't scale: we can't change the domain model two different ways at the same time, even when the changes we are making don't share any consistency rules. It's a poor way to model a business that can do more than one thing at a time.

Instead, we decompose the consistency rules into sets, subject to the constraint that two rules that reference the same data must be part of the same set. (In doing this, we are also working with the ubiquitous language and the domain experts to determine if we are correctly describing the consistency rules).

To update the model, we identify the aggregate responsible for a piece of data and propose the change. If the aggregate verifies all of its local consistency rules, we know that the change is globally valid, and we can apply the change. This restores our ability to do more than one thing at a time - changes to data in different aggregates can't possibly conflict with each other, by construction.

Best practices suggest that most aggregates should contain only the root entity. So you can conflate the aggregate with the entity without too much risk. But my guess it there won't usually be anything in the ubiquitous language to hang on the cluster when it includes more than one entity; so you end up with the ShoppingCart aggregate maintaining the consistency rules for the ShoppingCart entity and the CartItems entity collection and....

Partial loading of an aggregate is broken when trying to apply a change -- how could a well designed aggregate possible validate all of its consistency rules with a subset of the data? It's certainly the case that, if you have a requirement where this makes sense, your modeling is broken somewhere.

But if you are doing a read, loading only some of the data guarded by the aggregate can make sense. Command Query Responsibility Separation (CQRS) takes this a step further; once the model has verified that the data satisfies the consistency rules, you can completely rearrange that data into whatever read only form makes your life easiest. Put another way, if you aren't concerned with data changes, you don't need to worry about the aggregate boundary at all.

score 7 · Answer 2 · edited Dec 21 '20 at 21:39

There is no "database" or "transaction" in DDD. DDD is completely agnostic to databases, transactions or "eventual consistency". So any definition that includes those is not valid for DDD. That basically leaves only your first definition, which I feel is the correct one. Also, I don't see how you can feel Martin Fowler's description fits anything other than the first definition.

But I can see how confusion can creep in when you think about "practical DDD". That means DDD + infrastructure build on top of it. For example, it might not make sense to materialize whole aggregate from database if you are working with just part of it. But then, I would question if the aggregate is really properly designed. Maybe it should be split in multiple aggregates.

Either way, if you start bringing infrastructure into it, it stops being a concept of DDD and becomes a different concept. The only thing you can do is to realize this and make sure everyone on team agrees what "aggregate" means and what properties it has. Remember, names are primarily for efficient communication, and communication with your team has priority against communication with the outside world.

score 4 · Answer 3 · answered Dec 17 '15 at 19:25

It looks to me like Vaughn is struggling with the practical problems of monolithic aggregate roots in 'line of business' style software.

Where as Martin likes monolithic OOP software, which works well when you can fit your whole process into memory.

I don't think there is a contradiction between the two though. You have to choose where you split your domain objects into contexts and aggregates in DDD. On one hand you want to have aggregate roots which encompass an entire real life process, on the other you have practical difficulties if these get too big, so you may have to split them out a bit smaller and join them up with domain events or other trickery

Confusion about the meaning of the word aggregate in domain driven design

3 Answers3

Linked