Find a "hole" in a list of numbers

Question

What is the fastest way to find the first (smallest) integer that doesn't exist in a given list of unsorted integers (and that is greater than the list's smallest value)?

My primitive approach is sorting them and stepping through the list, is there a better way?

score 32 · Accepted Answer · answered Apr 24 '12 at 16:45

Assuming that you mean "integer" when you say "number", you can use a bitvector of size 2^n, where n is the number of elements (say your range includes integers between 1 and 256, then you can use an 256-bit, or 32 byte, bitvector). When you come across an integer in position n of your range, set the nth bit.

When you're done enumerating the collection of integers, you iterate over the bits in your bitvector, looking for the position of any bits set 0. They now match the position n of your missing integer(s).

This is O(2*N), therefore O(N) and probably more memory efficient than sorting the entire list.

score 5 · Answer 2 · answered Apr 24 '12 at 16:53

If you sort the entire list first, then you guarantee worst-case run-time. Also, your choice of sort algorithm is critical.

Here's how I'd approach this problem:

Use a heap sort, focusing on the smallest elements in the list.
After each swap, see if you have a gap.
If you find a gap, then return: You have found your answer.
If you don't find a gap, continue swapping.

Here's a visualization of a heap sort.

KeithS · Answer 3 · 2012-04-26T17:54:39.340

Just to be esoteric and "clever", in the special case of the array having only one "hole", you can try an XOR-based solution:

Determine the range of your array; this is done by setting a "max" and "min" variable to the first element of the array, and for each element after that, if that element is less than the min or greater than the max, set the min or max to the new value.
If the range is one less than the cardinality of the set, there is only one "hole" so you can use XOR.
Initialize an integer variable X to zero.
For each integer from min to max inclusively, XOR that value with X and store the result in X.
Now XOR each integer in the array with X, storing each successive result to X as before.
When you're done, X will be the value of your "hole".

This will run in roughly 2N time similar to the bitvector solution, but requires less memory space for any N > sizeof(int). However, if the array has multiple "holes", X will be the XOR "sum" of all the holes, which will be difficult or impossible to separate into the actual hole values. In that case you fall back to some other method such as the "pivot" or "bitvector" approaches from other answers.

You could recurse this as well using something similar to the pivot method to further reduce complexity. Rearrange the array based on a pivot point (which will be the max of the left side and the min of the right; it'll be trivial to find the max and min of the full array while pivoting). If the left side of the pivot has one or more holes, recurse into that side only; otherwise recurse into the other side. At any point where you can determine there's only one hole, use the XOR method to find it (which should be cheaper overall than continuing to pivot all the way down to a collection of two elements with a known hole, which is the base case for the pure pivot algorithm).

score 3 · Answer 4 · answered Apr 24 '12 at 16:41

What is the range of numbers you will encounter? If that range is not very large, you could solve this with two scans (linear time O(n)) using an array with as many elements as you have numbers, trading space for time. You could find the range dynamically with one more scan. To reduce space, you could assign 1 bit to each number, giving you 8 numbers worth of storage per byte.

Your other option which may be better for early scenarios and would be insitu instead of copying memory is to modify selection sort to quit early if the min found in a scanning pass is not 1 more than the last min found.

score 2 · Answer 5 · answered Apr 24 '12 at 16:52

No, not really. Since any not-yet-scanned number could always be one that fills a given "hole", you cannot avoid scanning each number at least once and then comparing it to it's possible neighbours. You probably could speed things up by building up a binary tree or so and then traversing it from left to right until a hole is found, but that is essentially of the same time complexity as sorting, since it is sorting. And you probably won't to come up with anything faster than Timsort.

score 1 · Answer 6 · answered Apr 26 '12 at 07:40

Most ideas here are no more than just sorting. The bitvector version is plain Bucketsort. Heap sort was also mentioned. It basically boils down to chosing the right sorting algorithm which depends on time/space requirements and also on the range and number of elements.

In my view, using a heap structure is probably the most general solution (a heap basically gives you the smallest elements efficiently without a complete sort).

You could also analyze approaches which find the smallest numbers first and then scan for each integer larger than that. Or you find the 5 smallest numbers hoping the will have a gap.

All of these algorithms have their strength depending on the input characteristics and program requirements.

score 1 · Answer 7 · answered Nov 17 '21 at 04:54

Amazingly, there is a better way, that is O(n), if all the elements in the list are distinct. (The list does not need to be sorted.) It comes from the Functional Programming community, and is due to Richard Bird. It is the first problem in his book "Pearls of Functional Algorithm Design" (which is an amazing read). The code in Haskell is:

minfree xs = minfrom 0 xs
minfrom a xs | null xs            = a
             | length us == b - a = minfrom b vs
             | otherwise          = minfrom a us
               where (us,vs) = partition (< b) xs
                           b = a + 1 + (length xs) div 2

(One can precompute the lengths of the lists to save a little time, but this does not change the asymptotic complexity.)

You can see that like other solutions in this thread, we do a partition. But the ingenious idea here is that if the left side of the partition is all full, we can recurse to the right, as there can be no free number on the left. We check that the left side is all full by checking that length us == b - a. b is the value we are partitioning by. us is the left side of the partition. Every element of xs is assumed to be greater than or equal to a.

For the complexity: for evaluating minfrom 0 xs, where length xs = n, we get the recurrence T(n) = T(n div 2) + Theta(n). This has solution T(n) = Theta(n). The idea is that the partition step cuts the size of the list in half, even though it takes a linear amount of time to do the partition. As Bird says, "A recursive process that performs Theta(n) processing steps in order to reduce the problem to another instance of at most half the size is also good enough [to get a linear-time algorithm]."

Kevin · Answer 8 · 2012-04-25T23:06:54.520

I believe I have come up with something that should work generally and efficiently if you are guaranteed not to have duplicates* (however, it should be extensible to any number of holes and any range of integers).

The idea behind this method is like quicksort, in that we find a pivot and partition around it, then recurse on the side(s) with a hole. To see which sides have the hole, we find the lowest and highest numbers, and compare them with the pivot and number of values on that side. Say the pivot is 17 and the minimum number is 11. If there are no holes, there should be 6 numbers (11, 12, 13, 14, 15, 16, 17). If there are 5, we know there is a hole on that side and we can recurse on just that side to find it. I'm having trouble explaining it more clearly than that, so let's take an example.

15 21 10 13 18 16 22 23 24 20 17 11 25 12 14

Pivot:

10 13 11 12 14 |15| 21 18 16 22 23 24 20 17 25

15 is the pivot, indicated by pipes (||). There are 5 numbers on the left side of the pivot, as there should be (15 - 10), and 9 on the right, where there should be 10 (25 - 15). So we recurse on the right side; we'll note that the previous bound was 15 in case the hole is adjacent to it (16).

[15] 18 16 17 20 |21| 22 23 24 25

Now there are 4 numbers on the left side but there should be 5 (21 - 16). So we recurse there, and again we'll note the previous bound (in brackets).

[15] 16 17 |18| 20 [21]

The left side has the correct 2 numbers (18 - 16), but the right has 1 instead of 2 (20 - 18). Depending on our ending conditions, we could compare the 1 number to the two sides (18, 20) and see that 19 is missing or recurse one more time:

[18] |20| [21]

The left side has a size of zero, with a gap between the pivot (20) and previous bound (18), so 19 is the hole.

*: If there are duplicates, you could probably use a hash set to remove them in O(N) time, keeping the overall method O(N), but that might take more time than using some other method.

score 0 · Answer 9 · edited Apr 25 '12 at 17:30

A solution that doesn't use additional storage or assume the width (32 bits) of integers.

In one linear pass find the smallest number. Lets call this "min". O(n) time complexity.
Pick a random pivot element and do a quicksort style partition.
If the pivot ended up in the position = ("pivot" - "min"), then recurse on the right side of the partition, else recurse on the left side of the partition. The idea here is that if there are no holes from the beginning, the pivot would be at ("pivot" - "min")th position, so the first hole should lie to the right of the partition and vice versa.
Base case is an array of 1 element and the hole lies between this element and the next one.

The expected total running time complexity is O(n) (8*n with the constants) and worst case is O(n^2). The time complexity analysis for a similar problem can be found here.

Find a "hole" in a list of numbers

9 Answers9