Different MySQL Datafile Sizes After Restoration

Question

I'm quite new in database administration. The database architecture of my current company is master-slave replication. We are using MySQL version 5.0.86.

Some weeks ago, we upgraded the company's main app. This upgrade added a column to all the tables of its database. Before the upgrade, the size of our ibdata was ~3.1GiB. After the upgrade, it is ~5.2GiB.

Today, I've done a full backup-restoration test with mysqldump. After restoration, ibdata size is ~3.6 GiB. Standard selects show the same data in restoration as in original database.

I've checked the data size through a select to information_schema (CUSTODIA is the name of our main app):

select SUM(DATA_LENGTH+INDEX_LENGTH) from TABLES where TABLE_SCHEMA='CUSTODIA';

this is the result in master database:

+-------------------------------+
| SUM(DATA_LENGTH+INDEX_LENGTH) |
+-------------------------------+
|                    5683345068 |
+-------------------------------+

this is the result in restored database:

+-------------------------------+
| SUM(DATA_LENGTH+INDEX_LENGTH) |
+-------------------------------+
|                    3735748608 |
+-------------------------------+

My questions:

Why is there this difference between original and restored database size?
Is it safe to asume that restored database is OK, although this difference in size?
How does MySQL calculate data_length? Is it an estimate?
Can I safely reduce production's ibdata file size to 3.6GiB with no down-time?

Any thoughts about this will be kindly appreciated. Thank you.

score 5 · Answer 1 · edited May 23 '17 at 12:40

This makes all the sense in the world to me.

InnoDB creates data pages and index pages that are 16K each. If rows of data are being inserted, updated, deleted, committed, and rolled back, you are going to have FRAGMENTATION !!!

There are two cases where you can have internal fragmentation:

A single row could be written in multiple data pages because certain column values make a row too big to fit in the data page.
Having a TEXT column with 32K of data in it.

In those two cases, a single row spanning multiple data pages would have to be chained like a linked list. The internally generated list of data pages would always have to be navigated when the row is read.

Giving credit where credit is due, PostgreSQL implemented a very brilliant mechanism called TOAST (The Oversized-Attribute Storage Technique) to keep oversized data outside of tables to stem the tide of this kind of internal fragmentation.

Have used mysqldump to make a file with CREATE TABLE statements, followed by lots of INSERTs, you get a fresh table with no unused space along with contiguous data and index pages when loading the mysqldump into a new server.

For my explanantions, let's assume you have an InnoDB table in the CUSTODIA database called userinfo

If you would like to compress a table, you have three(3) options

Option 1

OPTIMIZE TABLE CUSTODIA.userinfo;

Option 2

ALTER TABLE CUSTODIA.userinfo ENGINE=InnoDB;

Option 3

CREATE TABLE CUSTODIA.userinfo2 LIKE CUSTODIA.userinfo;
INSERT INTO CUSTODIA.userinfo2 SELECT * FROM CUSTODIA.userinfo;
DROP TABLE CUSTODIA.userinfo;
ALTER TABLE CUSTODIA.userinfo2 RENAME CUSTODIA.userinfo;

CAVEAT : Option 3 is no good on a table with constraints. Option 3 is perfect for MyISAM.

Now for your questions:

Question 1. Why is there this difference between original and restored database size?

As explained above

Question 2. Is it safe to assume that restored database is OK, although this difference in size?

If you want to make absolutely sure that the data on both servers are identical, simply run this command on both DB servers:

CHECKSUM TABLE CUSTODIA.userinfo;

Hopefully, the checksum value is identical for the same table on both servers. If you have dozens, even hundreds, of tables, you may have to script it.

Question 3 : How does MySQL calculate data_length? Is it an estimate?

You are using the correct method in summing up the data_length and index_length. Based on my explanation of fragmentation, it is an estimate.

Question 4. Can I safely reduce production's ibdata file size to 3.6GiB with no down-time?

GOOD NEWS !!! You absolutely can compress it !!! In fact, how would like to compress it to a fraction of that number ??? Follow these two links because I addressed this issue in StackOverflow and ServerFault.

https://stackoverflow.com/questions/3927690/howto-clean-a-mysql-innodb-storage-engine/4056261#4056261

https://serverfault.com/questions/230551/mysql-innodb-innodb-file-per-table-cons/231400#231400

BAD NEWS !!! Sorry, but you will have a 3-5 minute window of downtime for rebuilding ib_logfile0 and ib_logfile1 as well shrinking ibdata1 once and for all. It's well worth it since it will be a one-time operation.

score 1 · Answer 2 · answered Jul 20 '11 at 19:59

Q3 -- Are you asking about disk footprint? Or data size? The sizes are exact for how much disk space is used. There are overestimates of how big the data and index are -- because

16KB blocks are rarely full
fields bigger than 767B are shunted off to other areas, that are allocated in 1MB units (or something like that)
Once a table is bigger than a trivial size, extents of 8MB are added, with some of that being left as 'free'.
INSERTing in PRIMARY KEY order will leave the data blocks reasonably densely packed. Otherwise, data and index blocks are at the whim of random inserts and block splits during any load process. Random inserting leads to blocks being about 69% full, simply because of the random nature of block splits.
InnoDB tries to delay index updates, thereby lessening the mess I just described. But if the table is much bigger than innodb_buffer_pool_size, there is a limit of how much of this optimization will happen.

Because of all the factors above, the reload may lead to a bigger disk footprint, or a smaller one.

Ever notice that a 1-row table has an avg_row_length of 16KB?

Different MySQL Datafile Sizes After Restoration

2 Answers2

Linked