5

Is it possible to find the closest point (or k closest points) to any arbitrary point in a set of n points (in dimension N)?

When I write closest point, I mean smallest Euclidean distance. I'm looking for an algorithm for multiple searches, where the preprocessing time for the n points is negligible?

Is it possible to get the closest point(s) in a reasonable time (O(log(n)) for example with a method similar to binary search).

Any reference is welcome.

I already read the following questions:

But, none provide a satisfying answer.

Doc Brown
  • 218,378
Fifi
  • 159

2 Answers2

5

R-Tree, Oct-Tree, and similar spatial trees can quickly find close points for arbitrary sets of points.

  1. find the leaf containing your point (or the closest to it)
  2. Within the leaf find the closest contained point
  3. Search the tree again for any leaf intersecting the volume centred on the first point, containing the closest point just found.
  4. Excluding the already searched leaf, search the intersecting leaves for a closer point.
  5. You now have the closest point

If the data changes just use an R-Tree/Oct Tree/Spatial Tree. It will give you log(N) lookups.


But If the data does not change, or it is cost-effective to do so, you can pre-index the data.

One way is to simply create a hash table over all known data points, and a sub-sampling of "grid-points". The value in the hash points to the tree node/s its located in. The idea of grid-points is so that novel data points can be mapped to their closest grid point to start this search off. Depending on how the hash is calculated this can speed up searching the data significantly. Log(N) -> Amort O(1)

The next way is to flatten the trees out themselves into arrays of segments. Each segment being a bound on that dimension, eg: [0 <= _ < 3, 3 <= _ <= 8, 8 < _ <= 9, ...] (In fact you can optimise this further as each pair of cells shares the same number, and if <= 9 is the same as < 10 you can flatten even further by converting to the same operator and moving the number).

To create the segment arrays iterate through the points and add them into their own segments. For the moment list them if multiple share the same point.

Go through your segments and if there is more than one element pick another dimension and repeat for just those listed elements.

At worst you will have an N dimensional lookup. If you pick the same dimensions in the same order its easier to perform lookups. If you can pick a different dimension on each dereference you can reduce the overall depth at the cost of complexity.

Kain0_0
  • 16,561
1

Finding the closest point to a single preselected point is O(n)

You simply iterate once the list calculating the distance for each. The problem occurs if you want to find the two points in the list which have the shortest distance between them.

Or more commonly, the shortest path through the list going from node to node, where the combination of the O(n!) search time and square roots allow for the optimisation opportunities.

However its such a well known problem (the traveling salesman) there are lots of libraries out there which optimize for various scenarios.

If you are trying to solve it academically you can read up on the various approaches, but if you have a practical problem, supply more details and you may find a prewritten solution.

Ewan
  • 83,178