42

According to When is primitive obsession not a code smell?, I should create a ZipCode object to represent a zip code instead of a String object.

However, in my experience, I prefer to see

public class Address{
    public String zipCode;
}

instead of

public class Address{
    public ZipCode zipCode;
}

because I think the latter one requires me to move to the ZipCode class to understand the program.

And I believe I need to move between many classes to see the definition if every primitive data fields were replaced by a class, which feels as if suffering from the yo-yo problem (an anti-pattern).

So I would like to move the ZipCode methods into a new class, for example:

Old:

public class ZipCode{
    public boolean validate(String zipCode){
    }
}

New:

public class ZipCodeHelper{
    public static boolean validate(String zipCode){
    }
}

so that only the one who needs to validate the zip code would depend on the ZipCodeHelper class. And I found another "benefit" of keeping the primitive obsession: it keeps the class looks like its serialized form, if any, for example: an address table with string column zipCode.

My question is, is "avoiding the yo-yo problem" (move between class definitions) a valid reason to allow the "primitive obsession"?

ocomfd
  • 5,750

9 Answers9

120

The assumption is that you don't need to yo-yo to the ZipCode class to understand the Address class. If ZipCode is well-designed it should be obvious what it does just by reading the Address class.

Programs are not read end-to-end - typically programs are far too complex to make this possible. You cannot keep all the code in a program in your mind at the same time. So we use abstractions and encapsulations to "chunk" the program into meaningful units, so you can look at one part of the program (say the Address class) without having to read all code it depends on.

For example I'm sure you don't yo-yo into reading the source code for String every time you encounter String in code.

Renaming the class from ZipCode to ZipCodeHelper suggest there now is two separate concepts: a zip code and a zip code helper. So twice as complex. And now the type system cannot help you distinguish between an arbitrary string and a valid zip code since they have the same type. This is where "obsession" is appropriate: You are suggesting a more complex and less safe alternative just because you want to avoid a simple wrapper type around a primitive.

Using a primitive is IMHO justified in the cases where there is no validation or other logic depending on this particular type. But as soon as you add any logic, it is much simpler if this logic is encapsulated with the type.

As for serialization I think it sounds like a limitation in the framework you are using. Surely you should be able to serialize a ZipCode to a string or map it to a column in a database.

Rik D
  • 4,975
JacquesB
  • 61,955
  • 21
  • 135
  • 189
56

If can do:

new ZipCode("totally invalid zip code");

And the constructor for ZipCode does:

ZipCodeHelper.validate("totally invalid zip code");

Then you've broken encapsulation, and added a pretty silly dependency to the ZipCode class. If the constructor doesn't call ZipCodeHelper.validate(...) then you have isolated logic in its own island without actually enforcing it. You can create invalid zip codes.

The validate method should be a static method on the ZipCode class. Now the knowledge of a "valid" zip code is bundled together with the ZipCode class. Given that your code examples look like Java, the constructor of ZipCode should throw an exception if an incorrect format is given:

public class ZipCode {
    private String zipCode;

    public ZipCode(string zipCode) {
        if (!validate(zipCode))
            throw new IllegalFormatException("Invalid zip code");

        this.zipCode = zipCode;
    }

    public static bool validate(String zipCode) {
        // logic to check format
    }

    @Override
    public String toString() {
        return zipCode;
    }
}

The constructor checks the format and throws an exception, thereby preventing invalid zip codes from being created, and the static validate method is available to other code so the logic of checking the format is encapsulated in the ZipCode class.

There is no "yo-yo" in this variant of the ZipCode class. It's just called proper Object Oriented Programming.


We are also going to ignore internationalization here, which may necessitate another class called ZipCodeFormat or PostalService (e.g. PostalService.isValidPostalCode(...), PostalService.parsePostalCode(...), etc.).

11

If you wrestle a lot with this question, perhaps the language you use is not the right tool for the job? This kind of "domain-typed primitives" are trivially easy to express in, for example, F#.

There you could, for example, write:

type ZipCode = ZipCode of string
type Town = Town of string

type Adress = {
  zipCode: ZipCode
  town: Town
  //etc
}

let adress1 = {
  zipCode = ZipCode "90210"
  town = Town "Beverly Hills"
}

let faultyAdress = {
  zipCode = "12345"  // <-Compiler error
  town = adress1.zipCode // <- Compiler error
}

This is really useful for avoiding common mistakes, like comparing id's of different entities. And since these typed primitives are much more lightweight than a C# or Java-class, you'll end up actually use them.

Guran
  • 545
6

The answer depends entirely on what you actually want to do with the ZIP codes. Here are two extreme possibilities:

(1) All addresses are guaranteed to be in a single country. No exceptions at all. (E.g. no foreign customers, or no employees whose private address is abroad while they are working for a foreign customer.) This country has ZIP codes and they can be expected to never be seriously problematic (i.e. they don't require free-form input such as "currently D4B 6N2, but this changes every 2 weeks"). The ZIP codes are used not just for addressing, but for validation of payment information or similar purposes. - Under these circumstances, a ZIP code class makes a lot of sense.

(2) Addresses can be in almost every country, so dozens or hundreds of addressing schemes with or without ZIP codes (and with thousands of weird exceptions and special cases) are relevant. A "ZIP" code is really only asked for to remind people from countries where ZIP codes are used not to forget to provide theirs. The addresses are only used so that if someone loses access to their account and they can prove their name and address, access will be restored. - Under these circumstances, ZIP code classes for all relevant countries would be an enormous effort. Fortunately they are not needed at all.

3

The other answers have talked about OO domain modelling and using a richer type to represent your value.

I don't disagree, especially given the example code you posted.

But I also wonder if that actually answers the title of your question.

Consider the following scenario (pulled from an actual project I'm working on):

You have a remote application on a field device that talks to your central server. One of the DB fields for the device entry is a zip code for the address that the field device is at. You don't care about the zip code (or any of the rest of the address for that matter). All of the people who care about it are on the other side of an HTTP boundary: you just happen to be the single source of truth for the data. It has no place in your domain modeling. You just record it, validate it, store it, and on request shuffle it off in a JSON blob to points elsewhere.

In this scenario, doing much of anything beyond validating the insert with an SQL regex constraint (or its ORM equivalent) is probably overkill of the YAGNI variety.

Jared Smith
  • 1,935
2

The ZipCode abstraction could only make sense if your Address class did not also have a TownName property. Otherwise, you have half an abstraction: the zip code designates the town, but these two related bits of information are found in different classes. It doesn't quite make sense.

However, even then, it's still not a correct application (or rather solution to) primitive obsession; which, as I understand it, mainly focuses on two things:

  1. Using primitives as the input (or even output) values of a method, especially when a collection of primitives is needed.
  2. Classes that grow extra properties over time without ever reconsidering whether some of these should be grouped into a subclass of their own.

Your case is neither. An address is a well-defined concept with clearly necessary properties (street, number, zip, town, state, country, ...). There is little to no reason to break up this data as it has a single responsibility: designate a location on Earth. An address requires all of these fields in order to be meaningful. Half an address is pointless.

This is how you know that you don't need to subdivide any further: breaking it down any further would detract from the functional intention of the Address class. Similarly, you don't need a Name subclass to be used in the Personclass, unless Name (without a person attached) is a meaningful concept in your domain. Which it (usually) isn't. Names are used for identifying people, they usually have no value on their own.

Flater
  • 58,824
1

From the article:

More generally, the yo-yo problem can also refer to any situation where a person must keep flipping between different sources of information in order to understand a concept.

Source code is read far more often than it is written. Thus, the yo-yo problem, of having to switch between many files is a concern.

However, no, the yo-yo problem feels much more relevant when dealing with deeply interdependent modules or classes (which call back and forth between each other). Those are a special kind of nightmare to read, and is likely what the coiner of the yo-yo problem had in mind.

However - yes, avoiding too many layers of abstraction is important!

All non-trivial abstractions, to some degree, are leaky. - the Law of Leaky Abstractions.

For example, I disagree with the assumption made in mmmaaa's answer that "you don't need to yo-yo to [(visit)] the ZipCode class to understand the Address class". My experience has been that you do - at least the first few times you read the code. However, as others have noted, there are times when a ZipCode class is appropriate.

YAGNI (Ya Ain't Gonna Need It) is a better pattern to follow to avoid Lasagna code (code with too many layers) - abstractions, such as types and classes are there to aid the programmer, and should not be used unless they are an aid.

I personally aim to "save lines of code" (and of course the related "save files/modules/classes", etc). I'm confident there are some who would apply to me the epithet of "primitive obsessed" - I find it more important to have code which is easy to reason about than to worry about labels, patterns, and anti-patterns. The correct choice of when to create a function, a module/file/class, or put a function in a common location is very situational. I aim roughly for 3-100 line functions, 80-500 line files, and "1, 2, n" for reusable library code (SLOC - not including comments or boilerplate; I typically want at least 1 additional SLOC minimum per line of mandatory boilerplate). The important point is to keep in mind and respect the limits of human cognition when writing code.

Most positive patterns have arisen from developers doing exactly that, when they needed them. It is much more important to learn how to write readable code than to try to apply patterns without the same problem to solve. Any good developer can implement the factory pattern without having seen it before in the uncommon case where it is the right fit for their problem. I have used the factory pattern, the observer pattern, and probably hundreds besides, without knowing their name (ie, is there a "variable assignment pattern"?). For a fun experiment - see how many GoF patterns are built into the JS language - I stopped counting after about 12-15 back in 2009. The Factory pattern is as simple as returning an object from a JS constructor, for example - no need for a WidgetFactory.

So - yes, sometimes ZipCode is a good class. However, no, the yo-yo problem is not strictly relevant.

Iiridayn
  • 257
0

The yo-yo problem is only relevant if you have to flip back and forth. That is caused by one or two things (sometimes both):

  1. Bad naming. ZipCode seem fine, ZoneImprovementPlanCode is going to require a look by most people (and the few that don’t won’t be impressed).
  2. Inappropriate coupling. Say you ZipCode class has an area code lookup. You might think it makes sense because it’s handy, but it’s not really related to the ZipCode, and shoeing it into it means that now people don’t know where to go for things.

If you can look at the name and have a reasonable idea of what it does, and the methods and properties do reasonably obvious things, you don’t need to go look at the code, you can just use it. That is the whole point of classes in the first place—-they are modular pieces of code that can be used and developed in isolation. If you have to look at anything other than the API for the class to see what it does, it is at best a partial failure.

jmoreno
  • 11,238
-1

Remember, there is no silver bullet. If you are writing an extremely simple app which needs to be crawled through fast, then a simple string may do the job. However in 98% of the times, a Value Object as described by Eric Evans in DDD, would be the perfect fit. You can easily see all the benefits Value objects provide by reading around.