41

I just noticed that every modern OO programming language that I am at least somewhat familiar with (which is basically just Java, C# and D) allows covariant arrays. That is, a string array is an object array:

Object[] arr = new String[2];   // Java, C# and D allow this

Covariant arrays are a hole in the static type system. They make type errors possible that cannot be detected at compile-time, so every write to an array must be checked at runtime:

arr[0] = "hello";        // ok
arr[1] = new Object();   // ArrayStoreException

This seems like a terrible performance hit if I do lots of array stores.

C++ does not have covariant arrays, so there is no need to do such a runtime check, which means there is no performance penalty.

Is there any analysis done to reduce the number of runtime checks necessary? For example, if I say:

arr[1] = arr[0];

one could argue that the store cannot possibly fail. I'm sure there are lots of other possible optimizations I haven't thought of.

Do modern compilers actually do these kinds of optimizations, or do I have to live with the fact that, for example, a Quicksort always does O(n log n) unnecessary runtime checks?

Can modern OO languages avoid the overhead created by supporting co-variant arrays?

fredoverflow
  • 6,954

8 Answers8

35

D doesn't have covariant arrays. It allowed them prior to the most recent release (dmd 2.057), but that bug has been fixed.

An array in D is effectively just a struct with a pointer and a length:

struct A(T)
{
    T* ptr;
    size_t length;
}

Bounds checking is done normally when indexing an array, but it's removed when you compile with -release. So, in release mode, there's no real performance difference between arrays in C/C++ and those in D.

Jonathan M Davis
  • 1,435
  • 10
  • 9
22

Yes, one crucial optimization is this:

sealed class Foo
{
}

In C#, this class can't be a supertype for any type, so you can avoid the check for an array of type Foo.

And to the second question, in F# co-variant arrays are not allowed (but I guess the check will remain in the CLR unless it's found unnecessary in optimizations at runtime)

let a = [| "st" |]
let b : System.Object[] = a // Fails

https://stackoverflow.com/questions/7339013/array-covariance-in-f

A somewhat related problem is array bound-checking. This might be an interesting (but old) read about optimizations done in the CLR (covariance is also mentioned 1 place): Link

Glorfindel
  • 3,167
14

Java answer:

I take it you haven't actually benchmarked the code, have you? In general 90% of all dynamic casts in Java are free because the JIT can elide them (quicksort should be a good example for this) and the rest are one ld/cmp/brsequence which is absolutely predictable (if not, well why the hell is your code throwing all those dynamic cast exceptions?).

We do the load much earlier than the actual compare, the branch is correctly predicted in 99.9999% (made up statistic!) of all cases so we don't stall the pipeline (assuming we don't hit the memory with the load, if not well that will be noticeable, but then the load is necessary anyhow). Hence the cost is 1 clock cycle IF the JIT cannot avoid the check at all.

Some overhead? Sure, but I doubt you'll ever notice it..


To help support my answer, please see this Dr. Cliff Click blogpost discussing Java vs. C performance.

Voo
  • 763
10

D does not allow covariant arrays.

void main()
{
    class Foo {}
    Object[] a = new Foo[10];
}  

/* Error: cannot implicitly convert expression (new Foo[](10LU)) of type Foo[]
to Object[] */

As you say, it would be a hole in the type system to allow this.

You can be forgiven for the mistake, as this bug was only just fixed in the lastest DMD, released on December 13th.

Array access in D is just as fast as in C or C++.

5

From test I have done on a cheap laptop, the difference between using int[] and Integer[] is about 1.0 ns. The difference is likely to be due to the extra check for the type.

Generally Objects are only used for higher level logic when not every ns counts. If you need to save every ns, I would avoid using higher level constructs like Objects. Assignments alone are likely to be very small factor in any real program. e.g. creating a new Object on the same machine is 5 ns.

Calls to compareTo are likely to be much more expensive, esp if you using a complex object like String.

2

You asked about other modern OO languages? Well, Delphi avoids this problem entirely by having string be a primitive, not an object. So an array of strings is exactly an array of strings and nothing else, and any operations on them are as fast as native code can be, with no type checking overhead.

However, string arrays are not used very often; Delphi programmers tend to favor the TStringList class. For a minimal amount of overhead it provides a set of string-group methods that are useful in so many situations that the class has been compared to a Swiss Army Knife. So it's idiomatic to use a string list object instead of a string array.

As for other objects in general, the problem does not exist because in Delphi, like in C++, arrays are not covariant, in order to prevent the kind of type system holes described here.

Mason Wheeler
  • 83,213
1

or do I have to live with the fact that, for example, a Quicksort always does O(n log n) unnecessary runtime checks?

CPU performance is not monotonic, which means that longer programs can be faster than shorter ones (this is CPU dependant, and it is true for the common x86 and amd64 architectures). So it is possible that a program doing bound checking on arrays is actually faster than the program deduced from the previous by removing these bound checking.

The reason of this behaviour is that bound checking modifies the alignment of code in memory, will modify the frequency of cache hits, etc.

So yes, live with the fact that Quicksort always does O(n log n) spurious checks and optimise after profiling.

user40989
  • 2,940
1

Scala is an OO language that has invariant, rather than covariant, arrays. It targets the JVM, so there's no performance win there, but it avoids a misfeature common to both Java and C# that compromises their type-safety for reasons of backwards compatibility.

Pillsy
  • 111