Do you get the benefits of a B-Tree in a managed language?

Question

My understanding is that one of the key features of a B-Tree (and a B+Tree) is that it is designed such that the size of its nodes are some multiple of the block size of whatever media the data is stored on.

Considering that, in a memory managed language like java/c#, we don't really have access to how, when and what order, data is accessed from the drive... can we still predictably benefit from the major advantage of this data structure?

mikera · Accepted Answer · 2012-01-09T23:28:46.353

Yes, B-Trees still make good sense in managed languages.

A few points of explanation:

If you're using the B-Tree as an on-disk data structure, then I can absolutely guarantee that disk IO will be your bottleneck, not the fact that you are using a managed language.
If you are using a B-Tree in memory, then you can still have considerable control over memory layout from a caching perspective. For example, you can use large arrays for data storage in Java/C# and store tree nodes/data in the arrays using offsets rather than having a separate object to represent each tree node.
The advantages of a data structure are largely independent of language, at least up to a constant % factor. So if a B-Tree makes sense for your algorithm / access pattern, it will probably do so regardless of what language you are using.
On top of all that, it is generally the case that Java/C# can be nearly as fast as C/C++ if well optimised.

Mike Nakis · Answer 2 · 2012-01-08T15:58:40.993

The use of a managed language like Java, C# etc. has absolutely nothing to do with the way data is accessed from the drive, and in any case it certainly does not deprive developers from an iota of control over precisely how, when, and in what order data will be accessed from the drive.

The problem is elsewhere: managed languages suffer from the overhead of managed-to-native and native-to-managed transitions, where data often need to be copied from native buffers into managed buffers, and from the fact that they do not offer quick and easy support for inherently unsafe operations like picking four bytes from within an array of bytes and interpreting them as an integer. So, when you want to do a thing like that you have to invoke a function that will do the conversion for you, where in C++ you would just use a single machine instruction which dereferences a pointer.

Therefore, an implementation of a B-Tree in a managed language will suffer, but it will not be due to lack of control over precisely how, when, and in what order data is accessed from the drive.

Do you get the benefits of a B-Tree in a managed language?

2 Answers2