46

I am finding what is the best way to estimate size of a table for that I have studied lot of blogs and forums but unable to find any accurate answer

For example, we have a table City with InnoDB engine,lets say in future (in next 1 year) it will have 1 million of records so what will be the estimated data size and index size of that table in this period.

mysql> desc City;
+-------------+----------+------+-----+---------+----------------+
| Field       | Type     | Null | Key | Default | Extra          |
+-------------+----------+------+-----+---------+----------------+
| ID          | int(11)  | NO   | PRI | NULL    | auto_increment |
| Name        | char(35) | NO   |     |         |                |
| CountryCode | char(3)  | NO   | MUL |         |                |
| District    | char(20) | NO   |     |         |                |
| Population  | int(11)  | NO   |     | 0       |                |
+-------------+----------+------+-----+---------+----------------+
5 rows in set (0.03 sec)

UPDATE

What will be the estimated upper bound (Max size of table) with 1 million records and how can we estimate it.

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
Abdul Manaf
  • 9,587
  • 16
  • 73
  • 84

5 Answers5

68

Given the table description, I see

  • 66 bytes per row of data
  • 4 bytes per row for the primary key
  • 7 bytes per row for country code index
    • 3 bytes for the country
    • 4 bytes for Clustered Key attached to the country code
  • Total of 77 bytes of data and keys
  • This does not factoring housekeeping for BTREEs or Tablespace Fragmentation

For a million rows, that would 77,000,000 bytes (73.43 MB)

As for measuring the table, for a given table mydb.mytable, you can run this query

SELECT 
    CONCAT(FORMAT(DAT/POWER(1024,pw1),2),' ',SUBSTR(units,pw1*2+1,2)) DATSIZE,
    CONCAT(FORMAT(NDX/POWER(1024,pw2),2),' ',SUBSTR(units,pw2*2+1,2)) NDXSIZE,
    CONCAT(FORMAT(TBL/POWER(1024,pw3),2),' ',SUBSTR(units,pw3*2+1,2)) TBLSIZE
FROM
(
    SELECT DAT,NDX,TBL,IF(px>4,4,px) pw1,IF(py>4,4,py) pw2,IF(pz>4,4,pz) pw3
    FROM 
    (
        SELECT data_length DAT,index_length NDX,data_length+index_length TBL,
        FLOOR(LOG(IF(data_length=0,1,data_length))/LOG(1024)) px,
        FLOOR(LOG(IF(index_length=0,1,index_length))/LOG(1024)) py,
        FLOOR(LOG(data_length+index_length)/LOG(1024)) pz
        FROM information_schema.tables
        WHERE table_schema='mydb'
        AND table_name='mytable'
    ) AA
) A,(SELECT 'B KBMBGBTB' units) B;

To measure all tables grouped by Database and Storage Engine

SELECT
    IF(ISNULL(DB)+ISNULL(ENGINE)=2,'Database Total',
    CONCAT(DB,' ',IFNULL(ENGINE,'Total'))) "Reported Statistic",
    LPAD(CONCAT(FORMAT(DAT/POWER(1024,pw1),2),' ',
    SUBSTR(units,pw1*2+1,2)),17,' ') "Data Size",
    LPAD(CONCAT(FORMAT(NDX/POWER(1024,pw2),2),' ',
    SUBSTR(units,pw2*2+1,2)),17,' ') "Index Size",
    LPAD(CONCAT(FORMAT(TBL/POWER(1024,pw3),2),' ',
    SUBSTR(units,pw3*2+1,2)),17,' ') "Total Size"
FROM
(
    SELECT DB,ENGINE,DAT,NDX,TBL,
    IF(px>4,4,px) pw1,IF(py>4,4,py) pw2,IF(pz>4,4,pz) pw3
    FROM 
    (SELECT *,
        FLOOR(LOG(IF(DAT=0,1,DAT))/LOG(1024)) px,
        FLOOR(LOG(IF(NDX=0,1,NDX))/LOG(1024)) py,
        FLOOR(LOG(IF(TBL=0,1,TBL))/LOG(1024)) pz
    FROM
    (SELECT
        DB,ENGINE,
        SUM(data_length) DAT,
        SUM(index_length) NDX,
        SUM(data_length+index_length) TBL
    FROM
    (
       SELECT table_schema DB,ENGINE,data_length,index_length FROM
       information_schema.tables WHERE table_schema NOT IN
       ('information_schema','performance_schema','mysql')
       AND ENGINE IS NOT NULL
    ) AAA GROUP BY DB,ENGINE WITH ROLLUP
) AAA) AA) A,(SELECT ' BKBMBGBTB' units) B;

Run these queries and you can track changes in database/engine disk usage.

Give it a Try !!!

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
5

If you are using InnoDB tables, you can get the size for data/individual indexes from mysql.innodb_index_stats. The 'size' stat contains the answer, in pages, so you have to multiply it by the page-size, that is 16K by default.

select database_name, table_name, index_name, stat_value*@@innodb_page_size
from mysql.innodb_index_stats where stat_name='size';

The index PRIMARY is the data itself.

Vajk Hermecz
  • 163
  • 1
  • 6
2

If you don't have data yet, here are some tips. The following applies to InnoDB. (MyISAM is much simpler, and smaller.)

Don't use CHAR for variable-length columns. What CHARACTER SET are you using? Ascii needs one byte per character; utf8mb4 needs between 1 and 4.

4 bytes per INT
35 for CHAR(35), if ascii or latin1; varchar is probably less
3 for the country code; it is fixed length
etc

Total = about 80 bytes.

Multiply the 80 by between 2 and 3 to account for various overheads. Most likely the 1M row table will be between 160MB and 240MB.

To measure a single index, for say CountryCode of 3 bytes:

3 bytes data
4 bytes for the PK (implicitly included with any secondary key)
25 bytes basic overhead
32 total
times 1.5 -- overhead for BTree that was randomly inserted into
48MB -- total for 1M rows.

Notes:

  • Only the leaf nodes (of BTrees) need to be computed; the overhead for the non-leaf nodes is typically 1%.

  • The PRIMARY KEY is "clustered" with the data, so there is no need to compute it.

  • If you do not have an explicit PK, then you need to add 6 bytes to the row size to allow for the fabricated PK.

  • ROW_FORMAT = COMPRESSED gives you about a 2:1 shrinkage. (This is not as good as typical zip (etc) compression rate of 3:1.)

  • SHOW TABLE STATUS LIKE "tablename"; is the quick way to compute the 'actual' size. See Data_length for data and PK; Index_length for secondary indexes, and Data_free for some other stuff.

  • It is rare for Index_length to exceed Data_length. However it is not "wrong" for that to happen.

Rick James
  • 80,479
  • 5
  • 52
  • 119
1
SELECT  Table_NAME "tablename",
           data_length   "table data_length in Bytes",
           index_length  "table index_length in Bytes",
           data_free  "Free Space in Bytes"
    FROM  information_schema.TABLES  where  Table_schema = 'databasename';

by executing this query you can get size used for Data and Index of a table , You can check this size against # of rows and predict for 1 million rows

1

It's tedious. But the details are in the docs.

To be as accurate as possible, which is rarely necessary, you'll need to read about the table structure and index structure, too.

If I were in your shoes, I'd build the table, populate it with a million rows of test data, and measure the change in size. Depending on your application, you might need to take the size of transaction log files into account, too.