5

I want to estimate the disk space for 1 million rows.

enter image description here

Currently, I have 8527959 rows, Data Length of 744 MB and Index Length 989.4 MB.

Please let me know if my calculations are correct.

Total Data Length for 1 million rows

Data of 744MB consists of 46500 pages of 16KB blocks (InnoDB page size)

If 46500 pages consist of 8527959 rows, 1 page consists an average of 183 rows.

So, 1 million rows need (1,000,000/183) pages= 5465 pages of 16KB

So, 1 million rows of data need 87.4MB.

Total Index Length for 1 million rows

Indexes of of 989.4MB consists of 61837 pages of 16KB blocks (InnoDB page size)

If 61837 pages consist of 8527959 rows, 1 page consists an average of 138 rows.

So, 1 million rows need (1,000,000/138) pages= 7247 pages of 16KB

So, 1 million rows of data need 115.9MB.

Total disk space (approximate) for 1 million rows

Total data length+Total Index Length = 87.4MB + 115.9 MB = 203.3 MB

Is this correct?

PyRookie
  • 51
  • 1
  • 1
  • 2

1 Answers1

5

Your basic assumptions are correct, it is easy to see that the allocation per row is just under 100 bytes per row, and a bit more for the indexes, giving you a 200 * 1M = 200M of total size.

However, several important things have to be taken into account:

  • The tables must have identical structure and equivalent content- e.g. the same exact table with similar sized variable content.
  • With SHOW TABLE STATUS you are getting an estimation of size (rows, total size) based on sampling- it may not be accurate, specially if large changes happened to the table and automatic sampling didn't kick-off yet. You can force statistics gathering with ANALYZE TABLE command, but it can never guarantee you 100% accuracy
  • Everything is related to the logical size, not to physical size, or size on disk. If you were to delete 7/8ths of that table, the disk used on disk wouldn't shrink in the case of InnoDB, because disk defragmentation wouldn't happen- space is freed for other rows to be inserted, but it wouldn't be released to the filesystem. To reduce the on-disk size, you would need to recreate the table with OPTIMIZE TABLE, a more time consuming and blocking operations.
  • If disk space is a concern, consider using compression
jynus
  • 15,057
  • 2
  • 37
  • 46