3

This is kind of a multi-part question.

Is it possible to do a binary search tree if the data does not possess natural ordering? Would you be forced to impose artificial ordering to such data? Like images? Or executable files? Or video? or sound files? Items which do not possess an obvious alphabetic or numerical order (my idea of 'natural' total ordering).

Or would it just be better to use a hashmap at that point?

adv
  • 357

3 Answers3

7

Is it possible to do a binary search tree if the data does not possess natural ordering?

I don't know what the word "natural" means in your context; it seems vague.

Moreover, images, videos, executables and sound files all seem perfectly obviously orderable to me. Order them by byte ordinal comparison, in the event of a tie, the shorter file is smaller. Why do you think this is not a natural way to put something in order? That's how you put strings in order, so why not sound files? A sound file is just a string of bytes, so order it as a string of bytes.

Let me answer a question I know how to answer instead:

Is it possible to do a binary search tree if the data does not possess a total ordering?

No. Binary search trees require a total ordering.

What is a total order?

A total order simply means that you must provide a comparison operator that it can determine equality, greater than or less than on every pair of elements. And moreover, all the rules that you think of as obviously true for ordering must be met, such as:

  • A always equals A (reflexivity)
  • If A = B then B = A (symmetry of equality)
  • If A = B and B = C then A = C (transitivity)
  • If A < B and B < C then A < C (transitivity)
  • If A < B then B > A (antisymmetry of inequality)
  • ...

and so on. If you cannot meet these rules then you cannot do a binary search and consistently get correct results. If you can meet these rules, then you can.

Or would it just be better to use a hashmap at that point?

Better by what criterion? You haven't said what operations you intend to perform on this data structure other than searching. Hash maps are very good at some tasks that binary trees are bad at, and vice versa.

Eric Lippert
  • 46,558
3

An order is just a relationship between two elements in a set. The relationship is not always "is greater than". For example the relationship "is the son of" is, mathematically speaking, an order. It is not the order you need for a binary search because it is not well defined for all the elements: what if I pick two siblings? This relation exist only for some of the elements of a given set.

Instead the order that exist in the natural numbers is good because FOR EVERY number in the set is possible to state which "is greater than". However until you define a total order every relationship is good: for images "is greener than" will work fine. That's all you need for a binary search, a total order; It's up to your creativity/needs to find the proper order proposition. For more information.

This is the documentation of the Comparable interface in java needed to build a TreeSet. It's particularly interesting the description of the method compareTo.

Robert Harvey
  • 200,592
2

There's orderable data (which may be unordered), but it can be sorted by a variety of sorting algorithms. All of these sorting algorithms depend on the ability to perform a basic comparison ordering test between arbitrary given elements. Such a comparison must return the relative ordering of any two of the elements, for example, usually as -1 for less, 0 for equal, and 1 for greater. There is only one possible answer for the sorted result as a whole (barring duplicates).

There is also related data as you would have in a graph, possibly a directed graph, but not necessarily so. Given a graph we know something about the relative positions of the nodes via their connecting edges. However, there is no total ordering of all the nodes, just that some nodes are know to be before or after other nodes. We can perform a topological sort to order the nodes, but the bottom line is that there are many correct answers, so usually a topo sort will just pick one. A cyclic graph can have cycles, so again the ordering is (even more) arbitrary. Often we'll then look to other properties, for example, to choose the head element of a cycle to determine a good ordering for the domain.

Then there is unrelated data, where all we can do is compare for equality. For those, hashing is a reasonable data structure for storage and retrieval. There is no notion of sorting at all.

We should also consider that the same list of entities can be sorted by different properties or qualities. For example, files can be sorted by their size, which will provide a total ordering. They could be sorted by their timestamps. These may be useless for your domain, but still the point is to think about the various properties available for sorting, categorizing, etc...

Erik Eidt
  • 34,819