7

I am using Postgres with the following query:

select count(*) from image;

The primary key on this table is non-incrementing; it's a unique serial number for the images stored in the table. Our app often attempts to ingest images that have already been recorded in the database, so the primary key/serial number ensures they are only recorded once.

Now we are wondering if we should have gone with an incrementing primary key instead. We have 1,259,369 images in the database and it takes about 7 minutes for the count query to run.

Our app will never delete images from this table - so an incrementing primary key would allow us to check the value of the last ID which would equal the number of rows in the table.

andrewniesen
  • 661
  • 4
  • 10
  • 15

2 Answers2

9

Generally, if you don't need an exact count, there is a much faster way:

SELECT reltuples::bigint AS estimate
FROM   pg_class
WHERE  oid = 'image'::regclass;

See:

Strictly speaking, in a DB with concurrent write access every count is an estimate because the number may be outdated the instant you get it.

But there is something off in your DB. Counting a million rows should not take more than a few seconds in the worst case. (Much less with modern Postgres on modern hardware.)

That your "app will never delete images from this table" makes this even more suspicious because there shouldn't be many dead rows then. (Or are you updating a lot?) A huge amount of dead tuples could slow you down - and call for VACUUM. Normally, autovacuum takes care of this. Did you enable it? It's on by default in modern Postgres.

Check for dead tuples:

All the usual advice for performance optimization applies.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
0

Taken from my blog:

Using pg_stat_user_tables you can find row count of a PostgreSQL Table:

SELECT 
    schemaname
    ,relname
    ,n_live_tup AS EstimatedCount 
FROM pg_stat_user_tables 
ORDER BY n_live_tup DESC;
Paul White
  • 94,921
  • 30
  • 437
  • 687
Anvesh
  • 131
  • 1
  • 1
  • 3