Why are data classes considered a code smell?

Question

This article claims that a data class is a "code smell". The reason:

It's a normal thing when a newly created class contains only a few public fields (and maybe even a handful of getters/setters). But the true power of objects is that they can contain behavior types or operations on their data.

Why is it wrong for an object to contain only data? If the core responsibility of the class is to represent data, wouldn't add methods that operate on the data break the Single Responsibility Principle?

David Arno · Accepted Answer · 2016-12-15T11:29:59.997

There is absolutely nothing wrong with having pure data objects. The author of the piece quite frankly doesn't know what he's talking about.

Such thinking stems from an old, failed, idea that "true OO" is the best way to program and that "true OO" is all about "rich data models" where one mixes data and functionality.

Reality has shown us that actually the opposite is true, especially in this world of multi-threaded solutions. Pure functions, combined with immutable data-objects, is a demonstrably better way to code.

score 16 · Answer 2 · edited Mar 29 '21 at 17:23

There is absolutely nothing wrong with having pure data objects. The author has an opinion not shared by the software developers I know.

Especially for database mapping you in general have entity classes which only contain the fields stored in the data base and getters and setters. Wikipedia on the Hibernate framework

The whole idea of Java beans used by a lot of tools / frameworks is based on data classes called beans that only contain fields and the related getters and setters. Wikipedia on JavaBeans

Bottom line:
If someone claims that something is 'bad' or 'a code smell' you should always look for the reasons given. If the reasons do not convince you ask someone else for better reasons or a different opinion. (Like you did here.)

score 5 · Answer 3 · answered Mar 25 '19 at 23:30

A good argument why by Martin Fowler:

"Tell-Don't-Ask is a principle that helps people remember that object-orientation is about bundling data with the functions that operate on that data. It reminds us that rather than asking an object for data and acting on that data, we should instead tell an object what to do. This encourages to move behavior into an object to go with the data."

https://martinfowler.com/bliki/TellDontAsk.html

score 5 · Answer 4 · answered Mar 27 '19 at 14:06

What you need to understand is that there are two kinds of objects:

Objects that have behavior. These should refrain from giving public access to most/any of their data members. I expect only very few accessor methods defined for these.

An example would be a compiled regex: The object is created to provide a certain behavior (to match a string against a specific regex, and to report the (partial) matches), but how the compiled regex does its work is none of the user's business.

Most classes that I write are in this category.
Objects that are really just data. These should just declare all of their members public (or provide the full set of accessors for them).

An example would be a class Point2D. There is absolutely no invariant that needs to be ensured for the members of this class, and users should be able to just access the data via myPoint.x and myPoint.y.

Personally, I don't use such classes much, but I guess there is no larger piece of code that I've written that doesn't use such a class somewhere.

Becoming proficient with object orientation includes realizing that this distinction exists, and learning to classify a class' function into one of these two categories.

If you code in C++, you can make this distinction explicit by using class for the first category of objects, and struct for the second. Of course, the two are equivalent, except that class means that all members are private by default, while struct declares all members public by default. Which is exactly the sort of information you want to communicate.

score 5 · Answer 5 · answered Dec 16 '22 at 20:25

In Robert Martin's (Uncle Bob) book "Clean Code", he provides a great argument supporting data classes. He argues that "Data Structure" objects and "Data Transfer Objects" can be a good. They have data only and no functions.

Objects: hide their data (be private) and have functions to operate on that data.

Data Structures: show their data (be public) and have no functions.

The two concepts are opposites:

Procedural code (code using data structures)

Makes it easy to add new functions without changing the existing data structures.

Makes it hard to add new data structures because all the functions must change.

OO code (code using object oriented)

Makes it hard to add new functions because all the existing classes must change.

Makes it easy to add new classes without changing existing functions.

The Law of Demeter(LoD)

A method M of class C should only have access to C and M parameters. It should not access parameter1.getSubItem().getSubSubItem(). It should not know about the inner workings of its parameter classes.

Data Transfer Objects

This is a form of a data structure which is a class with public variables and no functions and sometimes called DTO. DTOs are very useful structures, especially when communicating with databases or parsing messages from sockets and so on.

Source: Clean Code | Chapter(6) | Objects and Data Structures

Source: https://www.linkedin.com/pulse/clean-code-chapter6-objects-data-structures-mahmoud-ibrahim

Uncle Bob requires that you not mix data classes and OO classes. So if your class has any logic, it becomes an OO class and if it also exposes it's internals via getters/setters, that is bad.

An interesting case of "Data Structure" classes is static inner classes. I regularly use "Data Structure" static-inner classes which are only accessible by the containing class. They are used to construct the data structures for my class. For example HashNode, ListNode, Pair, Tuple.

I would potentially even extend this static-inner-class argument to a module. There might be some "module-private" data-structure classes. They are not part of the module's public api. They are only for internal use inside the module (by the function-classes or services with a public api). But to another developer reading the code who is averse to data-structure classes, they might have trouble seeing the distinction that the class is a "module-private" class (not accessible by the module api) and just see a plain class among many classes in the repo which exposes all its internals publicly (this situation happened to me once in a PR review). So this kind of design can be slightly controversial/problematic.

I often program MVC code-bases. We have model objects or database ORM data objects. Should these be data-structures? Or should they be OO classes and have all their internals hidden? I find this is a common difficult situation hit by a lot of people. And these ORM classes commonly have both OO methods and getters/setters to access the database data. I don't have a great, definite answer to this. I don't necessarily think that all model objects should be locked down with zero getters, with a zealous fanaticism. But Tell-Don't-Ask can be a good principle to try to follow. Whatever you do, I really believe in simple, readable code. I feel like ORM classes are a special example because they are like the api to accessing the database. (Note also that the database can be thought of as a store of globals!)

What is definitely bad is when every object everywhere can freely reach into any other object freely without constraints, across a huge code-base. Particularly if there are lots of globals/singletons/globally-injected-services. This descends into a spaghetti mess. You want to reduce the scope of what-depends-on-what. If I change the structure of this class, what will break? Nothing outside the module should break if you have not changed the module public api. More importantly, when you are troubleshooting, it is hard to reason about code if an object's internals can be fiddled with all across the code-base.

score 4 · Answer 6 · answered Mar 29 '21 at 15:41

Here's what Martin Fowler has to say about data classes in his book "Refactoring":

Such classes are dumb data holders and are often being manipulated in far too much detail by other classes.

Data classes are often a sign of behavior in the wrong place, ...

Note the use of the word "often". Data classes are not always problematic. The way I understand it, such classes are a "code smell" in the sense that you should find out if there is any behavior related to them that is implemented somewhere outside the classes. If there is no such behavior, you are good. Continue using them. Otherwise, refactor them by moving the behavior to those classes.

Mike Robinson · Answer 7 · 2021-03-29T22:45:46.437

In my experience, "data-only classes" are often used to ensure that the object will never accept an incorrect value – nor, deliver one. The class is brimming with "trip wires," at least in development mode, which will throw exceptions if any of the values they're being assigned, or that they are asked to produce, are wrong.

And, in my humble experience, "that is a life-saver!"

The benefit of this strategy is simply that it gives you a way to detect the problem – to realize that the boat is sinking in the first place – and do so the moment that it happens. "Gotcha!! There's the culprit ... line #2 in the traceback." Without this, you never have known that the problem existed. And you certainly wouldn't [yet ...] know where.

Conversely: "if the exception didn't go off, the bug that you're looking for isn't right here."

"Always write code that is suspicious – looking for trouble."

Why are data classes considered a code smell?

7 Answers7