Why do mainstream strong static OOP languages prevent inheriting primitives?

Question

Why is this OK and mostly expected:

abstract type Shape
{
   abstract number Area();
}

concrete type Triangle : Shape
{
   concrete number Area()
   {
      //...
   }
}

...while this is not OK and nobody complains:

concrete type Name : string
{
}

concrete type Index : int
{
}

concrete type Quantity : int
{
}

My motivation is maximising the use of type system for compile-time correctness verification.

PS: yes, I have read this and wrapping is a hacky work-around.

score 83 · Answer 1 · edited Aug 11 '16 at 16:53

I assume you are thinking of languages like Java and C#?

In those languages primitives (like int) are basically a compromise for performance. They don't support all features of objects, but they are faster and with less overhead.

In order for objects to support inheritance, each instance need to "know" at runtime which class it is an instance of. Otherwise overridden methods cannot be resolved at runtime. For objects this means instance data is stored in memory along with a pointer to the class object. If such info should also be stored along with primitive values, the memory requirements would balloon. A 16 bit integer value would require its 16 bits for the value and additionally 32 or 64 bit memory for a pointer to its class.

Apart from the memory overhead, you would also expect to be able to override common operations on primitives like arithmetic operators. Without subtyping, operators like + can be compiled down to a simple machine code instruction. If it could be overridden, you would need to resolve methods at runtime, a much more costly operation. (You may know that C# supports operator overloading - but this is not the same. Operator overloading is resolved at compile time, so there is no default runtime penalty.)

Strings are not primitives but they are still "special" in how they are represented in memory. For example they are "interned", which means two strings literals which are equal can be optimized to the same reference. This would not be possible (or a least a lot less effective) if string instances should also keep track of the class.

What you describe would certainly be useful, but supporting it would require a performance overhead for every use of primitives and strings, even when they don't take advantage of inheritance.

The language Smalltalk does (I believe) allow subclassing of integers. But when Java was designed, Smalltalk was considered too slow, and the overhead of having everything be an object was considered one of the main reasons. Java sacrificed some elegance and conceptual purity to get better performance.

coredump · Answer 2 · 2016-08-10T22:22:55.990

What some language propose is not subclassing, but subtyping. For example, Ada lets you create derived types or subtypes. The Ada Programming/Type System section is worth reading to understand all details. You can restrict the range of values, which is what you want most of the time:

 type Angle is range -10 .. 10;
 type Hours is range 0 .. 23;

You can use both types as Integers if you convert them explicitly. Note also that you can't use one in place of another, even when the ranges are structurally equivalent (types are checked by names).

 type Reference is Integer;
 type Count is Integer;

Above types are incompatible, even though they represent the same range of values.

_{(But you can use Unchecked_Conversion; don't tell people I told you that)}

score 17 · Answer 3 · edited Apr 12 '17 at 07:31

I think this might very well be an X/Y question. Salient points, from the question...

My motivation is maximising the use of type system for compile-time correctness verification.

...and from your comment elaborating:

I don't want to be able to substitute one for another implicitly.

Excuse me if I'm missing something, but... If these are your aims, then why on Earth are you talking about inheritance? Implicit substitutability is... like... its entire thing. Y'know, the Liskov Substitution Principle?

What you seem to want, in reality, is the concept of a 'strong typedef' - whereby something 'is' e.g. an int in terms of range and representation but cannot be substituted into contexts that expect an int and vice-versa. I'd suggest searching for info on this term and whatever your chosen language(s) might call it. Again, it's pretty much literally the opposite of inheritance.

And for those who might not like an X/Y answer, I think the title might still be answerable with reference to the LSP. Primitive types are primitive because they do something very simple, and that's all they do. Allowing them to be inherited and thus making infinite their possible effects would lead to great surprise at best and fatal LSP violation at worst. If I may optimistically assume Thales Pereira won't mind me quoting this phenomenal comment:

There is the added problem that If someone was able to inherit from Int, you would have innocent code like "int x = y + 2" (where Y is the derived class) that now writes a log to the Database, opens a URL and somehow resurrect Elvis. Primitive types are supposed to be safe and with more or less guaranteed, well-defined behavior.

If someone sees a primitive type, in a sane language, they rightly presume it will always just do its one little thing, very well, without surprises. Primitive types have no class declarations available that signal whether they may or may not be inherited and have their methods overridden. If they were, it would be very surprising indeed (and totally break backwards compatibility, but I'm aware that's a backwards answer to 'why was X not designed with Y').

...although, as Mooing Duck pointed out in response, languages that allow operator overloading enable the user to confuse themselves to a similar or equal extent if they really want, so it's dubious whether this last argument holds. And I'll stop summarising other people's comments now, heh.

score 4 · Answer 4 · edited Aug 10 '16 at 22:42

In mainstream strong static OOP languages, sub-typing is seen primarily as a way to extend a type and to override the type's current methods.

To do so, 'objects' contain a pointer to their type. This is a overhead: the code in a method that uses a Shape instance first has to access the type information of that instance, before it knows the correct Area() method to call.

A primitive tends to only allow operations on it that can translate into single machine language instructions and do not carry any type information with them. Making an integer slower so that someone could subclass it was unappealing enough to stop any languages that did so becoming mainstream.

So the answer to:

Why do mainstream strong static OOP languages prevent inheriting primitives?

Is:

There was little demand
And it would have made the language too slow
Subtyping was primarily seen as a way to extend a type, rather than a way to get better (user-defined) static type checking.

However, we are starting to get languages that allow static checking based on properties of variables other then 'type', for example F# has "dimension" and "unit" so that you can't, for example, add a length to an area.

There are also languages that allow 'user-defined types' that don't change (or exchange) what a type does, but just help with static type checking; see coredump's answer.

Polygnome · Answer 5 · 2016-08-10T21:25:19.530

In order to allow inheritance with virtual dispatch 8which is often considered quite desirable in application design), one needs runtime type information. For every object, some data regarding the type of the object has to be stored. A primitive, per definition, lacks this information.

There are two (managed, run on a VM) mainstream OOP languages that feature primitives: C# and Java. Many other languages do not have primitives in the first place, or use similar reasoning for allowing them / using them.

Primitives are a compromise for performance. For each object, you need space for its object header (In Java, typically 2*8 bytes on 64-bit VMs), plus its fields, plus eventual padding (In Hotspot, every object occupies a number of bytes that is a multiple of 8). So an int as object would need at least 24 bytes of memory to be kept around, instead of only 4 bytes (in Java).

Thus, primitive types were added to improve performance. They make a whole lot of things easier. What does a + b mean if both are subtypes of int? Some kind of dispathcing has to be added to choose the correct addition. This means virtual dispatch. Having the ability to use a very simple opcode for the addition is much, much faster, and allows for compile-time optimizations.

String is another case. Both in Java and C#, String is an object. But in C# its sealed, and in Java its final. That because both the Java and C# standard libraries require Strings to be immutable, and subclassing them would break this immutability.

In case of Java, the VM can (and does) intern Strings and "pool" them, allowing for better performance. This only works when Strings are truly immutable.

Plus, one rarely needs to subclass primitive types. As long as primitives can not be subclassed, there are a whole lot of neat things that maths tells us about them. For example, we can be sure that addition is commutative and associative. Thats something the mathematical definition of integers tells us. Furthermore, we can easily prrof invariants over loops via induction in many cases. If we allow subclassing of int, we loose those tools that maths gives us, because we no longer can be guaranteed that certain properties hold. Thus, I'd say the ability not to be able to subclass primitive types is actually a good thing. Less things someone can break, plus a compiler can often proof that he is allowed to do certain optimizations.

score 3 · Answer 6 · answered Aug 10 '16 at 20:07

I'm not sure if I'm overlooking something here, but the answer is rather simple:

The definition of primitives is: primitive values are not objects, primitive types are not object types, primitives are not part of the object system.
Inheritance is a feature of the object system.
Ergo, primitives cannot take part in inheritance.

Note that there are really only two strong static OOP languages which even have primitives, AFAIK: Java and C++. (Actually, I'm not even sure about the latter, I don't know much about C++, and what I found when searching was confusing.)

In C++, primitives are basically a legacy inherited (pun intended) from C. So, they don't take part in the object system (and thus inheritance) because C has neither an object system nor inheritance.

In Java, primitives are the result of a misguided attempt at improving performance. Primitives are also the only value types in the system, it is, in fact, impossible to write value types in Java, and it is impossible for objects to be value types. So, apart from the fact that primitives don't take part in the object system and thus the idea of "inheritance" doesn't even make sense, even if you could inherit from them, you wouldn't be able to maintain the "value-ness". This is different from e.g. C♯ which does have value types (structs), which nonetheless are objects.

Another thing is that not being able to inherit is actually not unique to primitives, either. In C♯, structs implicitly inherit from System.Object and can implement interfaces, but they can neither inherit from nor inherited by classes or structs. Also, sealed classes cannot be inherited from. In Java, final classes cannot be inherited from.

tl;dr:

Why do mainstream strong static OOP languages prevent inheriting primitives?

primitives are not part of the object system (by definition, if they were, they wouldn't be primitive), the idea of inheritance is tied to the object system, ergo primitive inheritance is a contradiction in terms
primitives are not unique, lots of other types cannot be inherited as well (final or sealed in Java or C♯, structs in C♯, case classes in Scala)

score 2 · Answer 7 · edited Apr 12 '17 at 07:31

Joshua Bloch in “Effective Java” recommends designing explicitly for inheritance or prohibiting it. Primitive classes are not designed for inheritance because they are designed to be immutable and allowing inheritance could change that in subclasses, thus break Liskov principle and it would be a source of many bugs.

Anyways, why is this a hacky workaround? You should really prefer composition over inheritance. If the reason is performance than you have a point and the answer to your question is that it is not possible to put all features in Java because it takes time to analyze all different aspects of adding a feature. For example Java didn't have Generics before 1.5.

If you have a lot of patience then you are lucky because there is a plan to add value classes to Java which will allow you to create your value classes which will help you increase the performance and in the same time it will give you more flexibility.

Theodoros Chatzigiannakis · Answer 8 · 2016-08-12T05:06:27.287

At the abstract level, you can include anything you want in a language you're designing.

At the implementation level, it's inevitable that some of those things will simpler to implement, some will be complicated, some can be made fast, some are bound to be slower, and so on. To account for this, designers often have to make hard decisions and compromises.

At the implementation level, one of the fastest ways we have come up for accessing a variable is finding out its address and loading the contents of that address. There are specific instructions in most CPUs for loading data from addresses and those instructions usually need to know how many bytes they need to load (one, two, four, eight, etc) and where to put the data they load (single register, register pair, extended register, other memory, etc). By knowing the size of a variable, the compiler can know exactly which instruction to emit for usages of that variable. By not knowing the size of a variable, the compiler would need to resort to something more complicated and probably slower.

At the abstract level, the point of subtyping is to be able to use instances of one type where an equal or more general type is expected. In other words, code can be written that expects an object of a particular type or anything more derived, without knowing ahead of time what exactly this would be. And clearly, as more derived types can add more data members, a derived type does not necessarily have the same memory requirements as its base types.

At the implementation level, there's no simple way for a variable of a predetermined size to hold an instance of unknown size and be accessed in a way you'd normally call efficient. But there is a way to move things around a little and use a variable not to store the object, but to identify the object and let that object be stored somewhere else. That way is a reference (e.g. a memory address) -- an extra level of indirection that ensures that a variable only needs to hold some kind of fixed-size information, as long as we can find the object through that information. To achieve that, we just need to load the address (fixed-size) and then we can work as usual using those offsets of the object that we know are valid, even if that object has more data at offsets we don't know. We can do that because we don't concern ourselves with its storage requirements when accessing it anymore.

At the abstract level, this method allows you to store a (reference to a) string into an object variable without losing the information that makes it a string. It's fine for all types to work like this and you might also say it's elegant in many respects.

Still, at the implementation level, the extra level of indirection involves more instructions and on most architectures it makes each access to the object somewhat slower. You can allow the compiler to squeeze more performance out of a program if you include in your language some commonly used types that don't have that extra level of indirection (the reference). But by removing that level of indirection, the compiler cannot allow you to subtype in a memory safe way anymore. That's because if you add more data members to your type and you assign to a more general type, any extra data members that don't fit in the space allocated for the target variable will be sliced away.

score 1 · Answer 9 · answered Aug 10 '16 at 11:53

In general

If a class is abstract (metaphor: a box with hole(s)), it's OK (even required to have something usable !) to "fill the hole(s)", that's why we subclass abstract classes.

If a class is concrete (metaphor: a box full), it's not OK to alter the existing because if it's full, it's full. We have no room to add something more inside the box, that's why we shouldn't subclass concrete classes.

With primitives

Primitives are concrete classes by design. They represent something that is well-known, fully definite (I've never seen a primitive type with something abstract, otherwise it's not a primitive anymore) and widely used through the system. Allowing to subclass a primitive type and provide your own implementation to others that rely on the designed behaviour of primitives can cause a lot of side-effects and huge damages !

score 1 · Answer 10 · answered Aug 12 '16 at 16:10

Usually inheritance is not the semantics you want, because you can't substitute your special type anywhere a primitive is expected. To borrow from your example, a Quantity + Index makes no sense semantically, so an inheritance relationship is the wrong relationship.

However, several languages have the concept of a value type that does express the kind of relationship you are describing. Scala is one example. A value type uses a primitive as the underlying representation, but has a different class identity and operations on the outside. That has the effect of extending a primitive type, but it's more of a composition instead of an inheritance relationship.

Why do mainstream strong static OOP languages prevent inheriting primitives?

10 Answers10

In general

With primitives