14

I am trying to start with a geo search project that will find all landmarks in the 10 km/miles (not important for this story) of a particular landmark.

So for example, lets say I have a database of a 1,000,000 landmarks. In order to find all landmarks in 10 miles range of a landmark with certain coordinates, I would have to calculate a distance between a landmark from my search and 1,000,000 landmarks.

Is there a better way to do that?

Alternative I was thinking is to categorize landmarks such as country, region, city, neighborhood, business, historical, etc. in such a way that business can be part of a neighborhood or city. City is a part of a region, a country, etc. This can narrow a list of calculations, but it still looks like a lot of work do to in order for search to be fast and accurate.

Could the Google Maps API help?

4 Answers4

29

Use a database with support for GIS (geographic information systems) queries. Most databases support this outright or have extensions, but the details will be database-specific (in their answer, Flater shows the syntax for SQL server).

If you need to implement such queries within your application, you can implement a data structure that allows spatial queries, e.g. a k-d Tree. This is like a binary search tree, except that each level of the tree partitions on a different coordinate dimension. This allows you to restrict the search to a smaller set of feasible candidates. Effectively, you translate your search “10km radius” into bounds for each coordinate dimension, and tighten the bounds as you recurse into the tree.

amon
  • 135,795
11

Yes, there's a better way. You need to use a spatial index. These indexes organize metadata about geometries to filter out far away geometries very rapidly, saving a lot of CPU cycles by avoiding the computations you describe. You shouldn't bother implementing one yourself as all major relational databases provide a spatial geometry type and indexes to go with them.

What you want to look into are "within distance" queries (queries for geometries within a certain distance of some other geometry). These are very standard and very much a solved problem and are possible in all of the above databases (and built into several):

  • PostGIS: ST_DWithin
  • SQL Server: STDistance (Not clear that index use on the 3D geography version of this function is supported)
  • Oracle: SDO_WITHIN_DISTANCE (This doesn't say explicitly that it will trigger index usage. I'd double check the query plan. You might need to apply an SDO_FILTER to get it to use the index.)
  • MySQL: Still figuring this out.

Workaround for triggering index usage

In the worst case where you have trouble getting the system to use the spatial index with these queries, you can add an additional filter. You'd create a square bounding box with sides of length 2*(search distance) centered at your search point and compare the table geometries' bounding boxes against that before checking the actual distance. That's what PostGIS' ST_DWithin above does internally anyway.


Distance in GIS

While spatial indexes are fantastic and absolutely the right solution to your problem, distance calculation can get logically complicated. In particular, you need to worry about what projection (basically all the parameters for the coordinate system) your data is stored in. Most 2D projections (things other than angular coordinate systems like the various lat/long projections) distort length significantly. For example, the Web Mercator projection (the one used by Google, Bing, and every other major base map provider) expands areas and distances increasingly as the location gets further from the equator. I might be wrong as I'm not formally educated in GIS, but the best I've seen for 2D projections is some specific ones that promise correct distances from a single, constant point in the entire world. (No, it's not practical to use a different projection for every query; that would render your indexes useless.)

The bottom line is that you need to make sure your math is accurate. The simplest way of doing so from a development perspective is to use angular projections (These are often referred to as "geographic.") and functions that support doing the math using a spheroid model, but these computations are slightly more expensive than the 2D counterparts and some DBs may not support indexing them. If you can get acceptable performance using them, though, that's probably the way to go. Another common option is regional projections (like UTM zones) that get both distances and areas pretty close to correct if your data is confined to a particular part of the world. What's best for your app will depend on your specific requirements, but be aware that you need to think this through and maybe learn a little bit about it.

This applies even if you don't use built in spatial indexes. Your data has some projection regardless of what technology or technique you are currently using or use in the future, and it's already currently affecting any queries and computations you're making.

jpmc26
  • 5,489
10

Since SQL Server 2008, there is a geography data type which stores locations (lat/lon pairs) and makes it easy for you to write location-related queries.

There is an existing StackOverflow answer that discusses this in-depth.

A basic query to find the nearest 7 items:

USE AdventureWorks2012  
GO  
DECLARE @g geography = 'POINT(-121.626 47.8315)';  
SELECT TOP(7) SpatialLocation.ToString(), City FROM Person.Address  
WHERE SpatialLocation.STDistance(@g) IS NOT NULL  
ORDER BY SpatialLocation.STDistance(@g);  

A basic query to find everything within 100m (second answer to the question)

-- Get the center point
DECLARE @g geography
SELECT @g = geo FROM yourTable WHERE PointId = something

-- Get the results, radius 100m
SELECT * FROM yourTable WHERE @g.STDistance(geo) <= 100
Flater
  • 58,824
3

I would agree that if possible using specific support in a database would be the most sensible way to do this.

However if I had to do this on a database without specific support I would start by querying for a square that encloses the circule e.g. (y > (y1 - rad)) AND (y < (y1 + rad)) AND (x > (x1 - rad)) AND (x < (x1 + rad)). Assuming your points have roughly even distribution querying for a square will get you your true matches plus about 30% extra false matches. You can then cull out the false matches.

Peter Green
  • 2,326