21

Everyone knows that, in tables that use InnoDB as engine, queries like SELECT COUNT(*) FROM mytable are very inexact and very slow, especially when the table gets bigger and there are constant row insertions/deletions while that query executes.

As I understood it, InnoDB doesn't store the row count in an internal variable, which is the reason for this problem.

My question is: Why is this so ? Would it be so hard to store such information ? It's an important information to know in so many situations. The only difficulty I see if such an internal count would be implemented is when transactions are involved: if the transaction is uncommitted, do you count the rows inserted by it or not ?

PS: I'm not an expert on DBs, I'm just someone who has MySQL as a simple hobby. So if I just asked something stupid, don't be excessively critical :D .

Radu Murzea
  • 408
  • 3
  • 11

3 Answers3

16

For starter there is no such thing as the 'current count' to store in a variable. A query like SELECT COUNT(*) FROM ... is subject to the current isolation level and all concurrent pending transactions. Depending on the isolation level, the query can see or not see rows inserted or deleted by pending uncommitted transactions. The only way to answer is to count the rows that are visible to the current transaction.

Note that I did not even touch the even more thorny subject of concurrent transactions that start or end during the count. Not to mention rollbacks...

Derek Downey
  • 23,568
  • 11
  • 79
  • 104
Remus Rusanu
  • 52,054
  • 4
  • 96
  • 172
9

I agree with @RemusRusanu (+1 for his answer)

SELECT COUNT(*) FROM mydb.mytable in InnoDB behaves like a transactional storage engine should. Compare it to MyISAM.

MyISAM

If mydb.mytable is a MyISAM table, launching SELECT COUNT(*) FROM mydb.mytable; is just like running SELECT table_rows FROM information_schema.table WHERE table_schema = 'mydb' AND table_name = 'mytable';. This triggers a quick lookup of the row count in the header of the MyISAM table.

InnoDB

If mydb.mytable is a InnoDB table, you get hodge-podge of things going on. You have MVCC going on, governing the following:

  • ib_logfile0/ib_logfile1 (Redo Logs)
  • ibdata1
    • Undo Logs
    • Rollbacks
    • Data Dictionary Changes
  • Buffer Pool Management
  • Transaction Isolation (4 types)
    • Repeatable Reads
    • Read Committed
    • Read Uncommitted
    • Serializable

Asking InnoDB for a table count requires navigation through these ominous things. In fact, one never really knows if SELECT COUNT(*) from mydb.mytable counts repeatable reads only or includes reads that have been committed and those that are uncommitted.

You could try to stabilize things a little by enabling innodb_stats_on_metadata.

According to the MySQL Documentation on innodb_stats_on_meta_data

When this variable is enabled (which is the default, as before the variable was created), InnoDB updates statistics during metadata statements such as SHOW TABLE STATUS or SHOW INDEX, or when accessing the INFORMATION_SCHEMA tables TABLES or STATISTICS. (These updates are similar to what happens for ANALYZE TABLE.) When disabled, InnoDB does not update statistics during these operations. Disabling this variable can improve access speed for schemas that have a large number of tables or indexes. It can also improve the stability of execution plans for queries that involve InnoDB tables.

Disabling it may or may not give you a more stable count in terms of setting up EXPLAIN plans. It may affect performance of SELECT COUNT(*) from mydb.mytable in either a good way, bad way, or not at all. Give it a Try and See !!!

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
0

While it would theoretically be possible to keep an accurate count of the number of rows for a given table with InnoDB, it would be at the cost of a lot of locking, which would negatively affect performance. It would also differ based on the isolation level.

MyISAM already does table level locking, so no extra cost there.

I seldom require a row count for a table, though I do use COUNT(*) quite a bit. I generally have a WHERE clause attached. Using an efficient index on a small result set, I find that they're fast enough.

I disagree that the counts are inaccurate. The counts represent a snapshot of the data, and I've always found them to be exact.

In short, MySQL leaves it up to you to implement this for InnoDB. You could store a count and increment/decrement it after each query. Though, the easier solution is probably to switch to MyISAM.

Marcus Adams
  • 154
  • 3