3

Is it possible to query the reltuples column for a given table with additional conditions like table.name LIKE 'hello%'?

Currently on my bigger tables the SELECT count(*) query takes long and I wouldn't need an exact count. So I wanted to know if it's possible to add WHERE clauses to the reltuples as well?

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
Christian Schmitt
  • 443
  • 2
  • 5
  • 13

3 Answers3

9

Not out of the box. But you can achieve it with a ...

Partial index

CREATE INDEX tbl_name_hello_idx ON tbl(tbl_id) WHERE name LIKE 'hello%';

SELECT reltuples FROM pg_class WHERE oid = 'tbl_name_hello_idx'::regclass; -- or schema-qualify table name

The actual index column (tbl_id in the example) is irrelevant (unless you have additional use for the index). Best pick a small column that is never changed: a serial PK column would be a perfect candidate. Or you could use a constant (which defeats additional purposes of that index):

CREATE INDEX tbl_name_hello_idx ON tbl((1)) WHERE name LIKE 'hello%';

Every index has its own entry in pg_class and its own reltuples count. The documentation:

reltuples ...
Number of rows in the table. This is only an estimate used by the planner. It is updated by VACUUM, ANALYZE, and a few DDL commands such as CREATE INDEX.

Hence, a partial index can be (ab-)used to get count estimates for any set of predicates, updated by autovacuum automatically. Or maybe you already have the index because you need it anyway?

This might actually be a very clever idea. But you have to weigh cost and benefits: the partial index is small for a rare condition, but the cost gets bigger for a common condition. And while keeping the statistics up to date is comparatively cheap, it's not free of cost.

Related:

TABLESAMPLE SYSTEM (n) in Postgres 9.5+

You will love the new feature in the upcoming Postgres 9.5, which only looks at a random sample of n % of blocks in the table to get a quick estimate. Example for 1 %:

SELECT 100 * count(*) AS estimate
FROM   tbl TABLESAMPLE SYSTEM (1)
WHERE  name LIKE 'hello%';

Details in the answer already linked above:

Alternative for given example

For the given example name LIKE 'hello%' you could get very fast exact results using the right index anyway:

CREATE INDEX tbl_name_text_pattern_idx ON tbl(name text_pattern_ops);

See:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
3

You've already been giving some excellent advice on how to create indexes to get the estimate more efficiently. But if you don't want to create another index to do that, another approach is to ask PostgreSQL to do the estimate for you.

If you want an estimate of a query like:

select count(*) from foo where bar like 'Hello%';

What you can do instead is execute:

explain select * from foo where bar like 'Hello%';

The first row returned by this query will look like:

Seq Scan on foo  (cost=0.00..52853.94 rows=1520803 width=256)

And you can parse out the number after 'rows='

This is the estimate that PostgreSQL uses for its own internal planning, and it incorporates all the information which PostgreSQL knows how to incorporate.

jjanes
  • 42,332
  • 3
  • 44
  • 54
1

Of course you can use it. See this page in the postgresql doc as example.

But take into account that there are several ways to count / estimate the rows in a table. See this stackoverflow answer for a couple of different ways.

Using your LIKE clause as example, and ignoring "system tables" you can build this kind of query:

SELECT
  nspname AS schemaname, relname as tablename, reltuples
FROM
  pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE
  nspname NOT IN ('pg_catalog', 'information_schema') AND
  relkind='r' AND
  relname LIKE 'hello%'
ORDER BY reltuples DESC;