3

I have multiple tables with point geometries and a GiST index eating many gigabits of space.

From what I understand about GiST indexes on polygons or polylines, it will store the bounding box and compute a first calculation on the bounding box instead of the full geometry resulting in faster processing.

But what about point geometry? Will it do anything more than ordering the points spatially?

Paul White
  • 94,921
  • 30
  • 437
  • 687
Boodoo
  • 65
  • 8

3 Answers3

3

All PostGIS geometry values store the bounding box in their header; this does not need an index.


A GiST index on a geometry column allows for searching exact or nearby positions without having to do a full table scan, just like a 'normal' B-tree index allows searching for plain values without having to do a full table scan.

Many (but not all) PostGIS functions can use a GiST indexes. This is noted in the reference; for example, ST_DWithin:

This function call includes a bounding box comparison that makes use of any indexes that are available on the geometries.

There is no such note for ST_Distance, so you should never do a distance search with ST_Distance(geo1, geo2) < 42. (If you ever need to search for the N nearest objects, note that the <-> operator is almost the same as ST_Distance, but does support an index when used for ORDER BY.)


Geometries need a different index type because they are two-dimensional values. A simple sorting (as with ORDER BY) would not be very useful; the internal R-tree used by the GiST index also has two dimensions.

CL.
  • 5,325
  • 1
  • 22
  • 23
3

In case the original question and the bounty are due to some misconception about what the index does exactly and how, here is Paul Ramsey with some pictures. And here's PostgreSQL doc.


In what way would a GiST index on a Geometry(Point) (not talking about MultiPoint) column be useful?

In a dramatic way. It's not that it somehow makes points get processed faster, it's that it lets you skip the ones you don't need to process at all and get to the ones you need, more quickly.

Would it be useful in a spatial processing (example: ST_DWithin) with a (Multi)Polygon column indexed with a GiST index?

Yes, that too. Consider this demo:

create table test_points(id smallserial primary key, pt geometry(Point));
create table test_points2(id smallserial primary key, pt geometry(Point));
create table test_polygons(id smallserial primary key, poly geometry(Polygon));

--2k random points, each insert into test_points (pt) select st_makepoint(random()1e3,random()1e3) from generate_series(1,2e3,1);

insert into test_points2 (pt) select st_makepoint(random()1e3,random()1e3) from generate_series(1,2e3,1);

--1k random polygons insert into test_polygons (poly) select st_convexhull( st_collect( ARRAY[ st_makepoint(random()1e3,random()1e3), st_makepoint(random()1e3,random()1e3), st_makepoint(random()1e3,random()1e3) ] ) ) from generate_series(1,1e3,1);

Point-point ST_DWithin with a radius of 30 and point-polygon ST_Intersects(), both without an index on either side:

explain analyze verbose --point-point
select p1.id, count(p2.id)
from test_points p1 left join test_points2 p2 on st_dwithin(p1.pt,p2.pt,30)
group by 1;
--Execution Time: 10117.017 ms

explain analyze verbose --point-polygon select test_points.id, test_polygons.id from test_points right join test_polygons on st_intersects(pt,poly); --Execution Time: 4771.741 ms

With a GiST index on Geometry(Point) column, one's about 50x faster, the other about 2x:

create index test_points_pt_gix on test_points using gist(pt) 
   with (fillfactor=100);--fancy
cluster verbose test_points using test_points_pt_gix;--even fancier

explain analyze verbose --point-point select p1.id, count(p2.id) from test_points p1 right join test_points2 p2 on st_dwithin(p1.pt,p2.pt,30) group by 1; --Execution Time: 230.952 ms

explain analyze verbose --point-polygon select test_points.id, test_polygons.id from test_points right join test_polygons on st_intersects(pt,poly); --Execution Time: 2611.132 ms

Zegarek
  • 361
  • 1
  • 5
1

A GiST index on points can be very useful. It will speed up queries with st_contains, st_overlaps and similar functions in the WHERE condition, where one of the arguments is the indexed column.

Laurenz Albe
  • 61,070
  • 4
  • 55
  • 90