Should we define types for everything?

Question

Recently I got into a problem with the readability of my code.
I had a function that did an operation and returned a string representing the ID of this operation for future reference (a bit like OpenFile in Windows returning a handle). The user would use this ID later to start the operation and to monitor its finish.

The ID had to be a random string because of interoperability concerns. This created a method that had a very unclear signature like so:

public string CreateNewThing()

This makes the intent of the return type unclear. I thought to wrap this string in another type that makes its meaning clearer like thus:

public OperationIdentifier CreateNewThing()

The type will contain only the string and be used whenever this string is used.

It's obvious that the advantage of this way of operation is more type safety and clearer intent, but it also creates a lot more code and code that isn't very idiomatic. On the one hand I like the added safety, but it also creates a lot of clutter.

Do you think it is a good practice to wrap simple types in a class for safety reasons?

score 162 · Answer 1 · edited Apr 12 '17 at 07:31

Primitives, such as string or int, have no meaning in a business domain. A direct consequence of this is that you may mistakenly use an URL when a product ID is expected, or use quantity when expecting price.

This is also why Object Calisthenics challenge features primitives wrapping as one of its rules:

Rule 3: Wrap all primitives and Strings

In the Java language, int is a primitive, not a real object, so it obeys different rules than objects. It is used with a syntax that isn’t object-oriented. More importantly, an int on its own is just a scalar, so it has no meaning. When a method takes an int as a parameter, the method name needs to do all of the work of expressing the intent. If the same method takes an Hour as a parameter, it’s much easier to see what’s going on.

The same document explains that there is an additional benefit:

Small objects like Hour or Money also give us an obvious place to put behavior that would otherwise have been littered around other classes.

Indeed, when primitives are used, it's usually extremely difficult to track the exact location of the code related to those types, often leading to severe code duplication. If there is Price: Money class, it is natural to find the range checking inside. If, instead, a int (worse, a double) is used to store product prices, who should validate the range? The product? The rebate? The cart?

Finally, a third benefit not mentioned in the document is the ability to change relatively easy the underlying type. If today my ProductId has short as its underlying type and later I need to use int instead, chances are the code to change will not span the entire code base.

The drawback—and the same argument applies to every rule of Object Calisthenics exercise—is that if quickly becomes too overwhelming to create a class for everything. If Product contains ProductPrice which inherits from PositivePrice which inherits from Price which in turn inherits from Money, this is not clean architecture, but rather a complete mess where in order to find a single thing, a maintainer should open a few dozen files every time.

Another point to consider is the cost (in terms of lines of code) of creating additional classes. If the wrappers are immutable (as they should be, usually), it means that, if we take C#, you have to have, within the wrapper at least:

The property getter,
Its backing field,
A constructor which assigns the value to the backing field,
A custom ToString(),
XML documentation comments (which makes a lot of lines),
A Equals and a GetHashCode overrides (also a lot of LOC).

and eventually, when relevant:

A DebuggerDisplay attribute,
An override of == and != operators,
Eventually an overload of the implicit conversion operator to seamlessly convert to and from the encapsulated type,
Code contracts (including the invariant, which is a rather long method, with its three attributes),
Several converters which will be used during XML serialization, JSON serialization or storing/loading a value to/from a database.

A hundred LOC for a simple wrapper makes it quite prohibitive, which is also why you may be completely sure of the long-term profitability of such wrapper. The notion of scope explained by Thomas Junk is particularly relevant here. Writing a hundred LOCs to represent a ProductId used all over your code base looks quite useful. Writing a class of this size for a piece of code which makes three lines within a single method is much more questionable.

Conclusion:

Do wrap primitives in classes which have a meaning in a business domain of the application when (1) it helps reducing mistakes, (2) reduces the risk of code duplication or (3) helps changing the underlying type later.
Don't wrap automatically every primitive you find in your code: there are many cases where using string or int is perfectly fine.

In practice, in public string CreateNewThing(), returning an instance of ThingId class instead of string might help, but you may also:

Return an instance of Id<string> class, that is an object of generic type indicating that the underlying type is a string. You have your benefit of readability, without the drawback of having to maintain a lot of types.
Return an instance of Thing class. If the user only needs the ID, this can easily be done with:
```
var thing = this.CreateNewThing();
var id = thing.Id;
```

Thomas Junk · Answer 2 · 2015-05-03T18:40:52.183

I would use the scope as a rule of thumb: The narrower the scope of generating and consuming such values is, the less likely you have to create an object representing this value.

Say you have the following pseudocode

id = generateProcessId();
doFancyOtherstuff();
job.do(id);

then the scope is very limited and I would see no sense in making it a type. But say, you generate this value in one layer and pass it to another layer (or even another object), then it would make perfect sense to create a type for that.

score 35 · Answer 3 · edited Jun 16 '20 at 10:01

Static type systems are all about preventing incorrect uses of data.

There are obvious examples of types doing this:

you can't get the month of a UUID
you can't multiply two strings.

There are more subtle examples

you can't pay for something using the length of desk
you can't make an HTTP request using someone's name as the URL.

We may be tempted to use double for both price and length, or use a string for both a name and a URL. But doing that subverts our awesome type system, and allows these misuses to pass the language's static checks.

Getting pound-seconds confused with Newton-seconds can have poor results at runtime.

This is particularly a problem with strings. They frequently become the "universal data type".

We are used to primarily text interfaces with computers, and we often extend these human interfaces (UIs) to programming interfaces (APIs). We think of 34.25 as the characters 34.25. We think of a date as the characters 05-03-2015. We think of a UUID as the characters 75e945ee-f1e9-11e4-b9b2-1697f925ec7b.

But this mental model harms API abstraction.

The words or the language, as they are written or spoken, do not seem to play any role in my mechanism of thought.

Albert Einstein

Similarly, textual representations should not play any role in designing types and APIs. Beware the string! (and other overly generic "primitive" types)

Types communicate "what operations make sense."

For example, I once worked on a client to an HTTP REST API. REST, properly done, use hypermedia entities, which have hyperlinks pointing to related entities. In this client, not only were the entities typed (e.g. User, Account, Subscription), the links to those entities were also typed (UserLink, AccountLink, SubscriptionLink). The links were little more than wrappers around Uri, but the separate types made it impossible to try to use an AccountLink to fetch a User. Had everything been plain Uris -- or even worse, string -- these mistakes would only be found at runtime.

Likewise, in your situation, you have data that is used for only one purpose: to identify an Operation. It shouldn't be used for anything else, and we shouldn't be trying to identify Operations with random strings we've made up. Creating a separate class adds readability and safety to your code.

Of course, all good things can be used in excess. Consider

how much clarity it adds to your code
how often it is used

If a "type" of data (in the abstract sense) is used frequently, for distinct purposes, and between code interfaces, it is a very good candidate for being a separate class, at the expense of verbosity.

score 11 · Answer 4 · answered May 03 '15 at 16:41

Do you think it is a good practice to wrap simple types in a class for safety reasons?

Sometimes.

This is one of those cases where you need to weigh the problems that may arise from using a string instead of a more concrete OperationIdentifier. What are their severity? What is their likeliness?

Then you need to consider the cost of using the other type. How painful is it to use? How much work is it to make?

In some cases, you'll save yourself time and effort by having a nice concrete type to work against. In others, it won't be worth the trouble.

In general, I think that this sort of thing should be done more than it is today. If you have an entity that means something in your domain, it is good to have that as its own type, since that entity is more likely to change/grow with the business.

ytoledano · Answer 5 · 2015-05-04T20:36:19.403

I generally agree that many times you should create a type for primitives and strings, but because the above answers recommend creating a type in most cases, I'll list a few reasons why/when not to:

Performance. I must refer to actual languages here. In C#, if a client ID is a short and you wrap it in a class - you've just created a lot of overhead: memory, since it's now 8 bytes on a 64 bit system and speed since now it's allocated on the heap.
When you make assumptions on the type. If a client ID is a short and you have some logic that packs it in some manner - you usually already make assumptions on the type. In all those places you now have to break the abstraction. If it's one spot, it's not a big deal, if it's all over the place, you might find that half the time you're using the primitive.
Because not every language has typedef. And for languages that don't and code that's already written, making such a change could be a big task that might even introduce bugs (but everyone has a test suite with good coverage, right?).
In some cases it reduces readability. How do I print this? How do I validate this? Should I be checking for null? Are all questions that requires you drill into the definition of the type.

score 9 · Answer 6 · edited Apr 12 '17 at 07:31

No, you should not define types (classes) for "everything".

But, as other answers state, it often is useful to do so. You should develop – consciously, if possible – a feeling or sense of too-much-friction due to the absence of a suitable type or class, as you write, test, and maintain your code. For me, the onset of too-much-friction is when I want to consolidate multiple primitive values as a single value or when I need to validate the values (i.e. determine which of all possible values of the primitive type correspond to valid values of the 'implied type').

I've found considerations such as what you've posed in your question to be responsible for too much design in my code. I've developed a habit of deliberately avoiding writing any more code than necessary. Hopefully you're writing good (automated) tests for your code – if you are, then you can easily refactor your code and add types or classes if doing so provides net benefits for the ongoing development and maintenance of your code.

Telastyn's answer and Thomas Junk's answer both make a very good point about the context and usage of the relevant code. If you're using a value inside a single block of code (e.g. method, loop, using block) then it's fine to just use a primitive type. It's even fine to use a primitive type if you're using a set of values repeatedly and in many other places. But the more frequently and widely you're using a set of values, and the less closely that set of values corresponds to the values represented by the primitive type, the more you should consider encapsulating the values in a class or type.

score 5 · Answer 7 · edited Jul 23 '17 at 13:42

Idea behind wrapping primitive types,

To establish domain specific language
Limit mistakes done by users by passing incorrect values

Obviously it would be very difficult and not practical to do it everywhere, but it's important to wrap the types where required,

For example if you have the class Order,

Order
{
   long OrderId { get; set; }
   string InvoiceNumber { get; set; }
   string Description { get; set; }
   double Amount { get; set; }
   string CurrencyCode { get; set; }
}

The important properties to look up an Order is mostly OrderId and InvoiceNumber. And the Amount and CurrencyCode are closely related, where if someone change the CurrencyCode without changing the Amount the Order no longer can be considered as valid.

So, only wrapping OrderId, InvoiceNumber and introducing a composite for currency makes sense in this scenario and probably wrapping description makes no sense. So preferred result could look like,

    Order
    {
       OrderIdType OrderId { get; set; }
       InvoiceNumberType InvoiceNumber { get; set; }
       string Description { get; set; }
       Currency Amount { get; set; }
    }

So there is no reason to wrap everything, but things that actually matters.

null · Answer 8 · 2015-05-05T14:16:26.583

On the surface it looks like all you need to do is identify an operation.

Additionally you say what an operation is supposed to do:

to start the operation [and] to monitor its finish

You say it in a way as if this is "just how the identification is used", but I would say these are properties describing an operation. That sounds like a type definition to me, there's even a pattern called the "command pattern" that fits very well..

It says

the command pattern [...] is used to encapsulate all information needed to perform an action or trigger an event at a later time.

I think this is very similar compared to what you want to do with your operations. (Compare the phrases I made bold in both quotes) Instead of returning a string, return an identifier for an Operation in the abstract sense, a pointer to an object of that class in oop for example.

Regarding the comment

It would be wtf-y to call this class a command and then not have the logic inside it.

No it wouldn't. Remember that patterns are very abstract, so abstract in fact that they are somewhat meta. That is, they often abstract the programming itself and not some real world concept. The command pattern is an abstraction of a function call. (or method) As if somebody hit the pause button right after passing the values of the parameters and right before execution, to be resumed later.

The following is considering oop, but the motivation behind it should hold true for any paradigm. I like to give some reasons why placing the logic into the command can be considered a bad thing.

If you had all that logic in the command, it would be bloated and messy. You would be thinking "oh well, all this functionality should be in separate classes." You would refactor the logic out of the command.
If you had all that logic in the command, it would be hard to test and get in the way when testing. "Holy crap, why do I have to use this command nonsense? I only want to test if this function spits out 1 or not, I don't want to call it later, I want to test it now!"
If you had all that logic in the command, it wouldn't make too much sense to report back when finished. If you think about a function to be executed synchronously by default, it makes no sense to be notified when it finishes executing. Maybe you are starting another thread because your operation is actually to render avatar to cinema format (may need additional thread on raspberry pi), maybe you have to fetch some data from a server, ...whatever it is, if there's a reason for reporting back when finished, it could be because there's some asynchronousity (is that a word?) going on. I think running a thread or contacting a server is logic that should not be in your command. This is somewhat of a meta argument and maybe controversial. If you think it makes no sense, please tell me in the comments.

To recap: the command pattern allows you to wrap functionality into an object, to be executed later. For the sake of modularity (functionality exists regardless of being executed via command or not), testability (functionality should be testable without command) and all those other buzz words that essentially express the demand to write good code, you would not put the actual logic into the command.

Because patterns are abstract, it can be hard to come up with good real world metaphors. Here's an attempt:

"Hey grandma, could you please hit the record button on the TV at 12 so I don't miss the simpsons on channel 1?"

My grandma does not know what happens technically when she hits that record button. The logic is elsewhere (in the TV). And that's a good thing. The functionality is encapsulated and information is hidden from the command, it is a user of the API, not necessarily part of the logic ...oh jeez I'm babbling buzz words again, I better finish this edit now.

score 4 · Answer 9 · answered May 06 '15 at 08:29

Unpopular opinion:

You should usually not define a new type!

Defining a new type to wrap a primitive or a basic class is sometimes called defining a pseudo-type. IBM describes this as a bad practice here (they focus specifically on misuse with generics in this case).

Pseudo types make common library functionality useless.

Java's Math functions can work with all number primitives. But if you define a new class Percentage (wrapping a double that can be in the range 0~1) all these functions are useless and need to be wrapped by classes that (even worse) need to know about the internal representation of the Percentage class.

Pseudo types go viral

When making multiple libraries you will often find that these pseudotypes go viral. If you use the, above mentioned, Percentage class in one library you will either need to convert it at the library boundary (losing all safety/meaning/logic/other reason you had for creating that type) or you will have to make these classes accessible to the other library as well. Infecting the new library with your types where a simple double might have sufficed.

Take away message

As long as the type you wrap does not need a lot of business logic I would advice against wrapping it in a pseudo class. You should only wrap a class if there are serious business constraints. In other cases naming your variables correctly should go a long way in conveying meaning.

An example:

A uint can perfectly represent a UserId, we can keep using Java's built in operators for uints (such as ==) and we do not need business logic to protect the 'internal state' of the user id.

score 3 · Answer 10 · edited May 04 '15 at 20:33

As with all tips, knowing when to apply the rules is the skill. If you create your own types in a type-driven language you get type checking. So generally that is going to be a good plan.

NamingConvention is key to readability. The two combined can convey intention clearly.
But /// is still useful.

So yes, I would say create many own types when their lifetime is beyond a class boundary. Also consider the use of both Struct and Class, not always Class.

Should we define types for everything?

10 Answers10

Linked

Related