Why is count(*) slow, when explain knows the answer?

Question

This query: select count(*) from planner_event takes a very long time to run - so long, I gave up and killed it before it finished. However, when I run explain select count(*) from planner_event, I can see a column in the output with the number of rows (14m).

How come explain can get the number of rows instantly, but count(*) takes a long time to run?

score 31 · Accepted Answer · edited Jul 10 '20 at 10:22

Explain is using previously gathered statistics (used by the query optimizer). Doing a select count(*) reads EVERY data block.

Here's a cheap way to get an estimated row count:

SELECT table_rows
FROM information_schema.tables
WHERE table_name='planner_event';

Even if you did select count(id), it might still take a very long time, unless you have a secondary index on id (also assuming id is a PRIMARY KEY). Because all data (including Row Data) is stored in B-Tree indexes, performing a select count(PK_COLUMN) is still a considerable amount of IO (needs to reads all data pages). If you have a secondary index on the PK field, it will be able to perform less IO to perform a count.

score 5 · Answer 2 · answered Aug 30 '17 at 18:29

Explain gets the number from some "statistics" that are used to estimate things for the Optimizer. That number can be far from correct -- I sometimes see it being more than a factor of 2 (higher or lower) than the exact value.

Performing the COUNT(*) on an InnoDB table must scan the table to avoid miscounting records that are busy being inserted/deleted by other connections but not yet "committed". Actually, it is good enough to do a full scan on some index, not necessarily the whole table (which contains the PRIMARY KEY).

How much RAM do you have? What is the value of innodb_buffer_pool_size? It might help if that were about 70% of RAM.

Why is count(*) slow, when explain knows the answer?

2 Answers2