23

I'm learning Haskell and as an exercise I'm making binary trees. Having made a regular binary tree, I want to adapt it to be self balancing. So:

  • Which is most efficient?
  • Which is easiest to implement?
  • Which is most often used?

But crucially, which do you recommend?

I assume this belongs here because it's open to debate.

6 Answers6

17

I would recommend you start with either a Red-Black tree, or an AVL tree.

The red-black tree is faster for inserting, but the AVL tree has a slight edge for lookups. The AVL tree is probably a little easier to implement, but not by all that much based on my own experience.

The AVL tree ensures that the tree is balanced after each insert or delete (no sub-tree has a balance factor greater than 1/-1, while the Red-black tree ensures that the tree is reasonably balanced at any time.

12

I would consider an alternative if you are fine with randomized data structures: Skip Lists.

From a high-level point of view, it's a tree structure, except that it's not implemented as a tree but as a list with multiple layers of links.

You'll get O(log N) insertions / searches / deletes, and you won't have to deal with all those tricky rebalancing cases.

I've never considered implementing them in a Functional Language though, and the wikipedia page does not show any, so it may not be easy (wrt to immutability)

user541686
  • 8,178
Matthieu M.
  • 15,214
8

If you want a relatively easy structure to start with (both AVL trees and red-black trees are fiddly), one option is a treap - named as a combination of "tree" and "heap".

Each node gets a "priority" value, often randomly assigned as the node is created. Nodes are positioned in the tree so that key ordering is respected, and so that heap-like ordering of priority values is respected. Heap-like ordering means that both children of a parent have lower priorities than the parent.

EDIT deleted "within key values" above - the priority and key ordering apply together, so priority is significant even for unique keys.

It's an interesting combination. If keys are unique and priorities are unique, there is a unique tree structure for any set of nodes. Even so, inserts and deletes are efficient. Strictly speaking, the tree can be unbalanced to the point where it is effectively a linked list, but this is extremely unlikely (as with standard binary trees), including for normal cases such as keys inserted in order (unlike standard binary trees).

6

Which is most efficient?

Vague and difficult to answer. The computational complexities are all well-defined. If that's what you mean by efficiency, there's no real debate. Indeed, all good algorithms come with proofs and complexity factors.

If you mean "run time" or "memory use" then you'll need to compare actual implementations. Then language, run-time, OS and other factors come into play, making the question difficult to answer.

Which is easiest to implement?

Vague and difficult to answer. Some algorithms may appear complex to you, but trivial to me.

Which is most often used?

Vague and difficult to answer. First there's the "by whom?" part of this? Haskell only? What about C or C++? Second, there's the proprietary software problem where we don't have access to the source to do a survey.

But crucially, which do you recommend?

I assume this belongs here because it's open to debate.

Correct. Since your other criteria aren't very helpful, this is all you're going to get.

You can get source for a large number of tree algorithms. If you want to learn something, you might simply implement every one you can find. Rather than ask for a "recommendation", just collect every algorithm you can find.

Here's the list:

http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree

There are six popular ones defined. Start with those.

S.Lott
  • 45,522
  • 6
  • 93
  • 155
4

A very simple balanced tree is an AA tree. It's invariant is simpler and thus easier implement. Because of its simplicity, its performance is still good.

As an advanced exercise, you can try to use GADTs to implement one of the variants of balanced trees whose invariant is enforced by the type system type.

Petr
  • 5,547
4

If you're interested in Splay trees, there is a simpler version of those which I believe was first described in a paper by Allen and Munroe. It doesn't have the same performance guarantees, but avoids complications in dealing with "zig-zig" vs. "zig-zag" rebalancing.

Basically, when searching (including searches for an insert point or node to delete), the node you find gets rotated directly towards the root, bottom up (e.g. as a recursive search function exits). At each step, you select a single left or right rotation depending on whether the child you want to pull up another step toward the root was the right child or left child (if I remember my rotation directions correctly, that's respectively).

Like Splay trees, the idea is that recently accessed items are always near the root of the tree, so quick to access again. Being simpler, these Allen-Munroe rotate-to-root trees (what I call them - don't know the official name) can be faster, but they don't have the same amortized performance guarantee.

One thing - since this data structure by definition mutates even for find operations, it would probably need to be implemented monadically. IOW it's maybe not a good fit for functional programming.