0

I have a PostgreSQL table defined with a unique constraint on 4 columns:

create table public.values (
  id bigint primary key generated always as identity,
  a smallint not null references .... (id),
  b smallint not null,
  c smallint not null,
  d int not null references .... (id),
  -- ... some other cols
  constraint unique_cols unique (a, b, c, d)
);

create index values_d on public.values using btree (dasc);

And this table has a lot of rows (it's +/- 400GB and grows by 8GB each day) - I think it's about 2.5 billion rows at this point. I want to delete rows identified via column a, but unless I specify either b or c in the where clause, the query plan indicates that a sequential scan on public.values will be done.

i.e. this query:

-- ... After running `analyze` on the tables that are used

explain delete from public.values where a = 2

shows the query plan (with 60 unique a values):

Delete on "values"  (cost=0.00..62375313.80 rows=0 width=0)
  ->  Seq Scan on "values"  (cost=0.00..62375313.80 rows=48160325 width=6)
        Filter: (a = 2)
JIT:
  Functions: 3
  Options: Inlining true, Optimization true, Expressions true, Deforming true

This makes no sense to me - I don't understand why the unique constraint index isn't used (I would expect this query to delete 50 million rows).

If I update the query to specify b, then I can get the query plan to indicate that an index scan will be done:

explain delete from public.values where a = 2 and b = 1

Shows this query plan:

Delete on "values"  (cost=0.58..4845719.33 rows=0 width=0)
  ->  Index Scan using values_unique_cols on "values"  (cost=0.58..4845719.33 rows=1940861 width=6)
        Index Cond: ((a = 2) AND (b = 1))
JIT:
  Functions: 3
  Options: Inlining true, Optimization true, Expressions true, Deforming true

Then if I change the delete query b clause "where" condition to be a greater than condition:

explain delete from public.values where a = 2 and b >= 1

... Back to the sequential scan.

BUT Changing the clause again:

explain delete from public.values where a = 2 and b  >= 7

And the plan is better (b lower than 6 results in a seq scan - b is a value between 1 and 20):

Delete on "values"  (cost=0.58..65344628.48 rows=0 width=0)
  ->  Index Scan using values_unique_cols on "values"  (cost=0.58..65344628.48 rows=36595426 width=6)
        Index Cond: ((a = 2) AND (b >= 7))
JIT:
  Functions: 3
  Options: Inlining true, Optimization true, Expressions true, Deforming true

How can I force PostgreSQL to use an index for a particular operation, or otherwise delete a large number of rows quickly?

I could run the query multiple times from the application, each time with a new b value, but I would prefer to make a single database call.

== EDIT

Subsequently to posting this question, I created an index on a, and re-analyzed the table. But the planner still indicates a seq. scan.

Zach Smith
  • 2,430
  • 13
  • 34
  • 65

1 Answers1

2

No, you cannot force PostgreSQL to use an index for a command. You have to make the index appetizing enough for the planner to choose it.

Based on your description of your table activity, I suspect your column statistics are inaccurate and that is throwing off plan choice. You can also influence the planner by changing the I/O cost values to move the threshold between index and sequential scans, but there lies dragons.

As for a cascaded delete, it will depend on how the planner builds the subquery. I think it would translate into something like DELETE FROM values WHERE a=2 AND d=2 which will break on whether the selectivity of d is better than a and thus if the values_d index is more useful than the unique index, if any index at all.

References: