Is there any reason to use "plain old data" classes?

Question

In legacy code I occasionally see classes that are nothing but wrappers for data. something like:

class Bottle {
   int height;
   int diameter;
   Cap capType;

   getters/setters, maybe a constructor
}

My understanding of OO is that classes are structures for data and the methods of operating on that data. This seems to preclude objects of this type. To me they are nothing more than structs and kind of defeat the purpose of OO. I don't think it's necessarily evil, though it may be a code smell.

Is there a case where such objects would be necessary? If this is used often, does it make the design suspect?

score 76 · Accepted Answer · answered Dec 27 '10 at 14:49

Definitely not evil and not a code smell in my mind. Data containers are a valid OO citizen. Sometimes you want to encapsulate related information together. It's a lot better to have a method like

public void DoStuffWithBottle(Bottle b)
{
    // do something that doesn't modify Bottle, so the method doesn't belong
    // on that class
}

than

public void DoStuffWithBottle(int bottleHeight, int bottleDiameter, Cap capType)
{
}

Using a class also allows you to add an additional parameter to Bottle without modifying every caller of DoStuffWithBottle. And you can subclass Bottle and further increase the readability and organization of your code, if needed.

There are also plain data objects that can be returned as a result of a database query, for example. I believe the term for them in that case is "Data Transfer Object".

In some languages there are other considerations as well. For example, in C# classes and structs behave differently, since structs are a value type and classes are reference types.

score 31 · Answer 2 · answered Feb 28 '12 at 17:09

Data classes are valid in some cases. DTO's are one good example mentioned by Anna Lear. In general though, you should regard them as the seed of a class whose methods haven't yet sprouted. And if you are running into a lot of them in old code, treat them as a strong code smell. They are often used by old C/C++ programmers who have never made the transision to OO programming and are a sign of procedural programming. Relying on getters and setters all the time (or worse yet, on direct access of non-private members) can get you into trouble before you know it. Consider an example of an external method that needs information from Bottle.

Here Bottle is a data class):

void selectShippingContainer(Bottle bottle) {
    if (bottle.getDiameter() > MAX_DIMENSION || bottle.getHeight() > MAX_DIMENSION ||
            bottle.getCapType() == Cap.FANCY_CAP ) {
        shippingContainer = WOODEN_CRATE;
    } else {
        shippingContainer = CARDBOARD_BOX;
    }
}

Here we've given Bottle some behavior):

void selectShippingContainer(Bottle bottle) {
    if (bottle.isBiggerThan(MAX_DIMENSION) || bottle.isFragile()) {
        shippingContainer = WOODEN_CRATE;
    } else {
        shippingContainer = CARDBOARD_BOX;
    }
}

The first method violates the Tell-Don't-Ask principle, and by keeping Bottle dumb we have let implicit knowledge about bottles, such as what makes one fragle (the Cap), slip into logic that is outside the Bottle class. You have to be on your toes to prevent this sort of 'leakage' when you habitually rely on getters.

The second method asks Bottle only for what it needs to do its job, and leaves Bottle to decide whether it is fragile, or larger than a given size. The result is a much looser coupling between the method and Bottle's implementation. A pleasant side-effect is that the method is cleaner and more expressive.

You'll rarely make use of this many fields of an object without writing some logic that ought to reside in the class with those fields. Figure out what that logic is and then move it to where it belongs.

score 9 · Answer 3 · answered Dec 27 '10 at 18:29

If this is the sort of thing you need, that's fine, but please, please, please do it like

public class Bottle {
    public int height;
    public int diameter;
    public Cap capType;

    public Bottle(int height, int diameter, Cap capType) {
        this.height = height;
        this.diameter = diameter;
        this.capType = capType;
    }
}

instead of something like

public class Bottle {
    private int height;
    private int diameter;
    private Cap capType;

    public Bottle(int height, int diameter, Cap capType) {
        this.height = height;
        this.diameter = diameter;
        this.capType = capType;
    }

    public int getHeight() {
        return height;
    }

    public void setHeight(int height) {
        this.height = height;
    }

    public int getDiameter() {
        return diameter;
    }

    public void setDiameter(int diameter) {
        this.diameter = diameter;
    }

    public Cap getCapType() {
        return capType;
    }

    public void setCapType(Cap capType) {
        this.capType = capType;
    }
}

Please.

Mike Dunlavey · Answer 4 · 2010-12-27T18:35:37.673

As @Anna said, definitely not evil. Sure you can put operations (methods) into classes, but only if you want to. You don't have to.

Permit me a small gripe about the idea that you have to put operations into classes, along with the idea that classes are abstractions. In practice, this encourages programmers to

Make more classes than they need to (redundant data structure). When a data structure contains more components than minimally necessary, it is un-normalized, and therefore contains inconsistent states. In other words, when it is altered, it needs to be altered in more than one place in order to remain consistent. Failure to perform all coordinated changes makes it inconsistent, and is a bug.
Resolve problem 1 by putting in notification methods, so that if part A is modified, it tries to propagate necessary changes to parts B and C. This is the primary reason why it is recommended to have get-and-set accessor methods. Since this is recommended practice, it appears to excuse problem 1, causing more of problem 1 and more of solution 2. This results not only in bugs due to incompletely implementing the notifications, but to a performance-sapping problem of runaway notifications. These are not infinite computations, merely very long ones.

These concepts are taught as good things, generally by teachers who haven't had to work within million-line monster apps riddled with these issues.

Here's what I try to do:

Keep data as normalized as possible, so that when a change is made to the data, it is done at as few code points as possible, to minimize the likelihood of entering an inconsistent state.
When data must be un-normalized, and redundancy is unavoidable, do not use notifications in an attempt to keep it consistent. Rather, tolerate temporary inconsistency. Resolve inconsistency with periodic sweeps through the data by a process that does only that. This centralizes the responsibility for maintaining consistency, while avoiding the performance and correctness problems that notifications are prone to. This results in code that is much smaller, error-free, and efficient.

score 5 · Answer 5 · answered Feb 28 '12 at 13:33

Agree with the Anna Lear,

Definitely not evil and not a code smell in my mind. Data containers are a valid OO citizen. Sometimes you want to encapsulate related information together. It's a lot better to have a method like...

Sometimes people forget to read the 1999 Java Coding Conventions which make it very plain that this kind of programming is perfectly fine. In fact if you avoid it, then your code will smell! (too many getters/setters)

From Java Code Conventions 1999: One example of appropriate public instance variables is the case where the class is essentially a data structure, with no behavior. In other words, if you would have used a struct instead of a class (if Java supported struct), then it's appropriate to make the class's instance variables public. http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-137265.html#177

When used correctly, PODs (plain old data structures) are better than POJOs just like POJOs are often better than EJBs.
http://en.wikipedia.org/wiki/Plain_Old_Data_Structures

Eva · Answer 6 · 2013-03-05T06:04:07.910

Structs have their place, even in Java. You should only use them if the following two things are true:

You just need to aggregate data that doesn't have any behavior, e.g. to pass as a parameter
It doesn't matter one bit what sort of values that aggregate data has

If this is the case, then you should make the fields public and skip the getters/setters. Getters and setters are clunky anyway, and Java is silly for not having properties like a useful language. Since your struct-like object shouldn't have any methods anyway, public fields make the most sense.

However, if either one of those do not apply, you're dealing with a real class. That means all fields should be private. (If you absolutely need a field at a more accessible scope, use a getter/setter.)

To check if your supposed-struct has behavior, look at when the fields are used. If it seems to violate tell, don't ask, then you need to move that behavior into your class.

If some of your data shouldn't change, then you need to make all those fields final. You might consider making your class immutable. If you need to validate your data, then provide validation in the setters and constructors. (A useful trick is to define a private setter and modify your field within your class using only that setter.)

Your Bottle example would most likely fail both tests. You could have (contrived) code that looks like this:

public double calculateVolumeAsCylinder(Bottle bottle) {
    return bottle.height * (bottle.diameter / 2.0) * Math.PI);
}

Instead it should be

double volume = bottle.calculateVolumeAsCylinder();

If you changed the height and diameter, would it be the same bottle? Probably not. Those should be final. Is a negative value ok for the diameter? Must your Bottle be taller than it is wide? Can the Cap be null? No? How are you validating this? Assume the client is either stupid or evil. (It's impossible to tell the difference.) You need to check these values.

This is what your new Bottle class might look like:

public class Bottle {

    private final int height, diameter;

    private Cap capType;

    public Bottle(final int height, final int diameter, final Cap capType) {
        if (diameter < 1) throw new IllegalArgumentException("diameter must be positive");
        if (height < diameter) throw new IllegalArgumentException("bottle must be taller than its diameter");

        setCapType(capType);
        this.height = height;
        this.diameter = diameter;
    }

    public double getVolumeAsCylinder() {
        return height * (diameter / 2.0) * Math.PI;
    }

    public void setCapType(final Cap capType) {
        if (capType == null) throw new NullPointerException("capType cannot be null");
        this.capType = capType;
    }

    // potentially more methods...

}

score 3 · Answer 7 · answered Dec 27 '10 at 15:12

This kind of classes are quite useful when you are dealing with mid-size/big applications, for some reasons:

it's quite easy to create some test cases and ensure that data is consistent.
it holds all kind of behaviors that involve that information, so data bug tracking time is reduce
Using them should keep method args lightweight.
When using ORMs , this classes gives flexybility and consistency. Adding a complex attribute that's calculated based on simple information already in the class, resutls in writing one simple method. This is quite more agile and productive that having to check your database and ensure all databases are patched with new modification.

So to sum up, in my experience they usually are more useful than annoying.

score 3 · Answer 8 · answered Dec 27 '10 at 20:15

3

With game design, the overhead of 1000's of function calls, and event listeners can sometimes make it worth it to have classes that only store data, and have other classes that loop through all the data only classes to perform the logic.

answered Dec 27 '10 at 20:15

AttackingHobo

199

score 0 · Answer 9 · 2017-12-16T09:42:59.477

IMHO there often aren't enough classes like this in heavily object-oriented systems. I need to carefully qualify that.

Of course if the data fields have wide scope and visibility, that can be extremely undesirable if there are hundreds or thousands of places in your codebase tampering with such data. That's asking for trouble and difficulties maintaining invariants. Yet at the same time, that doesn't mean that every single class in the entire codebase benefits from information hiding.

But there are many cases where such data fields will have very narrow scope. A very straightforward example is a private Node class of a data structure. It can often simplify the code a great deal by reducing the number of object interactions going on if said Node could simply consist of raw data. That serves as a decoupling mechanism since the alternative version might require bidirectional coupling from, say, Tree->Node and Node->Tree as opposed to simply Tree->Node Data.

A more complex example would be entity-component systems as used often in game engines. In those cases, the components are often just raw data and classes like the one you've shown. However, their scope/visibility tends to be limited since there are typically only one or two systems that might access that particular type of component. As a result you still tend to find it quite easy to maintain invariants in those systems and, moreover, such systems have very few object->object interactions, making it very easy to comprehend what's going on at the bird's eye level.

In such cases you can end up with something more like this as far as interactions go (this diagram indicates interactions, not coupling, since a coupling diagram might include abstract interfaces for the second image below):

... as opposed to this:

... and the former type of system tends to be a lot easier to maintain and reason about in terms of correctness in spite of the fact that the dependencies are actually flowing towards data. You get much less coupling primarily because many things can be turned into raw data instead of objects interacting with each other forming a very complex graph of interactions.

score -1 · Answer 10 · answered Dec 27 '10 at 15:06

There are even much strange creatures exist in the OOP world. I've once created a class, which is abstract, and contains nothing, it just for to be a common parent of other two classes... it's the Port class:

SceneMember 
Message extends SceneMember
Port extends SceneMember
InputPort extends Port
OutputPort extends Port

So SceneMember is the base class, it does all the job, which required to appear on the scene: add, delete, get-set ID, etc. Message is the connection between ports, it has its own life. InputPort and OutputPort contain the i/o specific functions of their own.

Port is empty. It is just used to group InputPort and OutputPort together for, say, a portlist. It's empty because SceneMember contains all stuff which are for acting as a member, and InputPort and OutputPort contain the port-specified tasks. InputPort and OutputPort are such different, that they have no common (except SceneMember) functions. So, Port is empty. But it's legal.

Maybe it's an antipattern, but I like it.

(It's like the word "mate", which is used for both "wife" and "husband". You never use the "mate" word for a concrete person, because you know the sex of him/her, and it does not matter if he/she is married or not, so you use "someone" instead, but it still exists, and is used in rare, abstract situations, e.g. a law text.)

Is there any reason to use "plain old data" classes?

10 Answers10

Linked