Do real-world algorithms that greatly outperform in the class below exist?

Question

Last night I was discussing with another programmer that even though something may be O(1), an operation which is O(n) may outperform it if there are is a large constant in the O(1) algorithm. He disagreed, so I've brought it here.

Are there examples of algorithms which greatly outperform those in the class below it? For example, O(n) being faster than O(1) or O(n²) being faster than O(n).

Mathematically this can be demonstrated for a function with an asymptotic upper bounds, when you disregard constant factors, but do such algorithms exist in the wild? And where would I find examples of them? What types of situations are they used for?

score 46 · Accepted Answer · answered Oct 06 '11 at 23:48

46

Lookups in very small, fixed data tables. An optimized hash table may be O(1) and yet slower than a binary search or even a linear search due to the cost of the hash calculation.

answered Oct 06 '11 at 23:48

Loren Pechtel

3,421

Peter Taylor · Answer 2 · 2011-10-07T10:00:37.067

25

Matrix multiplication. The naïve O(n^3) algorithm is often used in practice as faster than Strassen's O(n^2.8) for small-ish matrices; and Strassen's is used instead of the O(n^2.3) Coppersmith–Winograd algorithm for larger matrices.

edited Oct 07 '11 at 10:00

answered Oct 07 '11 at 08:20

Peter Taylor

4,043

molf · Answer 3 · 2011-10-06T23:41:24.673

A simple example is the difference between various sorting algorithms. Mergesort, Heapsort, and some others are O(n log n). Quicksort is O(n^2) worst case. But often Quicksort is faster, and in fact it performs on average like O(n log n). More info.

Another example is the generation of a single Fibonacci number. The iterative algorithm is O(n), whereas the matrix-based algorithm is O(log n). Still, for the first couple of thousand Fibonacci numbers, the iterative algorithm is probably faster. This also depends on the implementation of course!

Algorithms with a better asymptotic performance may contain costly operations that are not necessary with an algorithm with worse performance but simpler operations. In the end, the O-notation only tells us something about performance when the argument it operates on increases dramatically (approaches infinity).

NoChance · Answer 4 · 2011-10-07T18:14:45.580

18

Note: Please read the comments by @back2dos below and other gurus, as they are in fact more helpful than what I have written - Thanks for all contributors.

I think from the chart below (taken from: Big O notation, search for "The Pessimistic Nature of Algorithms:"), you can see that O(log n) is not always better than say, O(n). So, I guess your argument is valid.

Pic-1

edited Oct 07 '11 at 18:14

answered Oct 06 '11 at 23:27

NoChance

12,532

score 11 · Answer 5 · edited May 23 '17 at 12:40

For practical values of n, yes. This comes up a lot in CS theory. Often there is a complicated algorithm that has technically better big-Oh performance, but the constant factors are so large as to make it impracticaly.

I once had my computational geometry professor describe an algorithm for triangulating a polygon in linear time, but he finished with "very complicated. I don't think anyone's actually implemented it" (!!).

Also, fibonacci heaps have better characteristics than normal heaps, but are not very popular because they don't perform as well in practice as regular heaps. This can cascade to other algorithms that use heaps - for instance, Dijkstra's shortest-paths is mathematically faster with a fibonacci heap, but usually not in practice.

score 10 · Answer 6 · answered Oct 07 '11 at 00:35

Compare inserting into a linked list and inserting into a resizable array.

The amount of data has to be fairly large for the linked list O(1) insertion to be worthwhile.

A linked list has extra overhead for next pointers and dereferences. A resizable array has to copy data around. That copying is O(n), but in practice very fast.

Yam Marcovic · Answer 7 · 2011-10-06T23:48:39.627

The Big-Oh notation is used to describe a function's growth-rate, so it is possible that an O(1) algorithm will be faster, but only up to a certain point (the constant factor).

Common notations:

O(1) - The number of iterations (sometimes you can refer to this as user-time spent by the function) is not dependent on the size of the input, and is in fact constant.

O(n) - The number of iterations grows in a linear proportion to the size of the input. Meaning - if the algorithm iterates through any input N, 2 * N times, it is still considered O(n).

O(n^2) (quadratic) - The number of iterations is the input size squared.

score 6 · Answer 8 · answered Oct 07 '11 at 12:46

Regex libraries are usually implemented to do backtracking which has worst case exponential time rather than DFA generation which has a complexity of O(nm).

Naive backtracking can be a better performer when the input stays on the fast path or fails without the need to backtrack excessively.

(Although this decision isn't just performance based, it's also to allow back references.)

score 5 · Answer 9 · answered Oct 07 '11 at 11:50

An O(1) algorithm:

def constant_time_algorithm
  one_million = 1000 * 1000
  sleep(one_million) # seconds
end

An O(n) algorithm:

def linear_time_algorithm(n)
  sleep(n) # seconds
end

Clearly, for any value of n where n < one_million, the O(n) algorithm given in the example will be faster than the O(1) algorithm.

While this example is a bit facetious, it is equivalent in spirit to the following example:

def constant_time_algorithm
  do_a_truckload_of_work_that_takes_forever_and_a_day
end

def linear_time_algorithm(n)
  i = 0
  while i < n
    i += 1
    do_a_minute_amount_of_work_that_takes_nanoseconds
  end
end

You must know the constants and coefficients in your O expression, and you must know the expected range of n, in order to determine a priori which algorithm will end up being faster.

Otherwise, you must benchmark the two algorithms with values of n in the expected range in order to determine a posteriori which algorithm ended up being faster.

score 4 · Answer 10 · answered Oct 07 '11 at 07:47

Sorting:

Insertion sort is O(n^2) but outperforms others O(n*log(n)) sorting algorithms for small number of elements.

This is the reason why most sort implementations use a combination of two algorithms. E.g. use merge sort to break down large arrays until they reach a array certain size, then use insertion sort to sort the smaller units and merge them again with merge sort.

See Timsort the current default implementation of Python and Java 7 sorting that use this technique.

score 4 · Answer 11 · answered Oct 07 '11 at 11:19

4

The unification algorithm used in practice is exponential in the worst case, for some pathological inputs.

There is a polynomial unification algorithm, but it is too slow in practice.

answered Oct 07 '11 at 11:19

starblue

631

score 3 · Answer 12 · answered Oct 07 '11 at 09:33

3

Bubblesort in memory can outperform quicksort when the program is being swapped to disk or need to read every item from disk when comparing.

This should be an example he can relate to.

answered Oct 07 '11 at 09:33

Matthew Scouten · Answer 13 · 2012-04-18T14:48:16.573

Often the more advanced algorithms assume a certain amount of (expensive) setup. If you only need to run it once, you might be better off with the brute-force method.

For example: binary search and hash table lookup are both much faster per lookup then a linear search, but they require you to sort the list or build the hash table, respectively.

The sort will cost you N log(N) and the hash table will cost at least N. Now if you are going to be doing hundreds or thousands of lookups, that is still an amortized savings. But if you only need to do one or two lookups, it might just make sense to just do the linear search and save the startup cost.

score 1 · Answer 14 · answered Oct 07 '11 at 08:05

1

Decryption is often 0(1). For example the key space for DES is 2^56, so decryption of any message is a constant time operation. Its just that you have a factor of 2^56 in there so its a really big constant.

answered Oct 07 '11 at 08:05

Zachary K

10,413

score 1 · Answer 15 · answered Oct 07 '11 at 09:43

Different implementations of sets spring to my mind. One of the most naive is implementing it over a vector, which means remove as well as contains and therefore also add all take O(N).
An alternative is to implement it over some general purpose hash, which maps input hashes to input values. Such a set implementation performs with O(1) for add, contains and remove.

If we assume N is about 10 or so, then the first implementation is probably faster. All it has to do to find an element is to compare 10 values to one.
The other implementation will have to start all sorts of clever transformations, which can be a lot more expensive, than making 10 comparisons. With all the overhead, you might even have cache misses and then it really doesn't matter how fast your solution is in theory.

This doesn't mean, that the worst implementation you can think of will outperform a decent one, if N is small enough. It simply means for sufficiently small N, that a naive implementation, with low footprint and overhead can actually require less instructions and cause less cache misses than an implementation that puts scaleability first, and will therefore be faster.

You can't really know how fast anything is in a real world scenario, until you put it into one and simply measure it. Often results are surprising (at least to me).

dr jimbob · Answer 16 · 2011-10-07T18:03:41.007

Yes, for suitably small N. There will always be a N, above which you will always have the ordering O(1) < O(lg N) < O(N) < O(N log N) < O(N^c) < O(c^N) (where O(1) < O(lg N) means that at an O(1) algorithm will take fewer operations when N is suitably large and c is a some fixed constant that is greater than 1).

Say a particular O(1) algorithm takes exactly f(N) = 10^100 (a googol) operations and an O(N) algorithm takes exactly g(N) = 2 N + 5 operations. The O(N) algorithm will give greater performance until you N is roughly a googol (actually when N > (10^100 - 5)/2), so if you only expected N to be in the range of 1000 to a billion you would suffer a major penalty using the O(1) algorithm.

Or for a realistic comparison, say you are multiplying n-digit numbers together. The Karatsuba algorithm is at most 3 n^(lg 3) operations (that is roughly O(n^1.585) ) while the Schönhage–Strassen algorithm is O(N log N log log N) which is a faster order, but to quote wikipedia:

In practice the Schönhage–Strassen algorithm starts to outperform older methods such as Karatsuba and Toom–Cook multiplication for numbers beyond 2^2^15 to 2^2^17 (10,000 to 40,000 decimal digits).[4][5][6]

So if you are multiplying 500 digit numbers together, it doesn't make sense to use the algorithm that's "faster" by big O arguments.

EDIT: You can find determine f(N) compared g(N), by taking the limit N->infinity of f(N)/g(N). If the limit is 0 then f(N) < g(N), if the limit is infinity then f(N) > g(N), and if the limit is some other constant then f(N) ~ g(N) in terms of big O notation.

score 1 · Answer 17 · answered Apr 18 '12 at 20:49

The simplex method for linear programming can be exponential in the worst case, while relatively new interior point algorithms can be polynomial.

However, in practice, the exponential worst case for the simplex method doesn't come up -- the simplex method is fast and reliable, while early interior point algorithms were far too slow to be competitive. (There are now more modern interior point algorithms which are competitive -- but the simplex method is, too...)

Ed Staub · Answer 18 · 2011-10-07T17:57:49.467

Ukkonen's algorithm for building suffix tries is O(n log n). It has the advantage of being "on-line" - that is, you can incrementally append more text.

Recently, other more complex algorithms have claimed to be faster in practice, largely because their memory access has higher locality, thus improving processor cache utilization and avoiding CPU pipeline stalls. See, e.g., this survey, which claims that 70-80% of processing time is spent waiting for memory, and this paper describing the "wotd" algorithm.

Suffix tries are important in genetics (for matching gene sequences) and, somewhat less importantly, in the implementation of Scrabble dictionaries.

score 0 · Answer 19 · answered Oct 07 '11 at 18:25

There's always the fastest and shortest algorithm for any well-defined problem. It's only purely theoretically the (asymptotically) fastest algorithm though.

Given any description of a problem P and an instance for that problem I, it enumerates all possible algorithms A and proofs Pr, checking for each such pair whether Pr is a valid proof that A is the asymptotically fastest algorithm for P. If it finds such a proof, it then executes A on I.

Searching for this problem-proof pair has complexity O(1) (for a fixed problem P), so you always use the asymptotically fastest algorithm for the problem. However, since this constant is so unspeakably enormous in nearly all cases, this method is completely useless in practice.

score 0 · Answer 20 · answered Oct 07 '11 at 18:53

0

Many languages/frameworks use naive pattern matching to match strings instead of KMP. We look for string like Tom, New York rather than ababaabababababaababababababab.

answered Oct 07 '11 at 18:53

Lukasz Madon

1,496
13
22

Do real-world algorithms that greatly outperform in the class below exist?

20 Answers20