MySQL Slave database size >> Master Database

Question

I have a Percona MySQL Master-Master setup (5.6) with second master as passive master(read-only).

Slave:

mysql> SELECT table_schema AS "Database", ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS "Size (MB)" FROM information_schema.TABLES GROUP BY table_schema;
+--------------------+-----------+
| Database           | Size (MB) |
+--------------------+-----------+
| common_schema      |      1.89 |
| information_schema |      0.01 |
| mysql              |      1.70 |
| dataplay           |   1684.56 |
| dataplay1          |      0.14 |
| percona            |      0.20 |
| performance_schema |      0.00 |
+--------------------+-----------+

Master:

mysql> SELECT table_schema AS "Database", ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS "Size (MB)" FROM information_schema.TABLES GROUP BY table_schema;
+--------------------+-----------+
| Database           | Size (MB) |
+--------------------+-----------+
| common_schema      |      1.89 |
| information_schema |      0.01 |
| mysql              |      1.69 |
| dataplay           |   1614.80 |
| dataplay1          |      0.14 |
| percona            |      0.20 |
| performance_schema |      0.00 |
+--------------------+-----------+
7 rows in set (0.07 sec)

The size difference is massive. Is it something wrong with the setup which is running? What is wrong?

score 3 · Answer 1 · answered Jan 20 '16 at 18:11

From the details, it looks like the 'massive' difference is ~70MB in the dataplay schema. You can adjust your query to find out exactly which tables are different.

Two potential reasons for this difference is:

Schema difference on the master vs slave. Does the slave have utf8 instead of latin1 on a column or three? Does the slave have additional indexes? You can compare the schema by running a mysqldump of just the schema (--no-data option) and then diff'ing the results.
The slave has actual different data. Do you have session-only data that the application stores and won't matter? If no, you can use pt-table-checksum to find rows that are actually different. Then you need to dig into why they are different and fix that issue before reseeding the passive master or using pt-table-sync to resolve the differences.

score 2 · Answer 2 · edited Apr 13 '17 at 12:42

When you setup a Slave by loading a mysqldump from a Master, there is a strong likelihood that many of the the new Slave's tables will be bigger than the Master because of the order BTREE indexes are being loaded as the corresponding tables are being loaded. Usually that order is dictated by PRIMARY KEYs. If you talking about InnoDB, data and PRIMARY KEY nodes coexists in the same pages and segments (This is known as the gen_clust_index or Clustered Index).

This bloating of indexes due to ordered keys being loaded is also evident with MyISAM tables.

See my earlier posts

May 01, 2014 : Why would the size of MySQL MyISAM table indexes (aka MYI file) not match after mysqldump import?
Oct 26, 2012 : How badly does innodb fragment in the face of somewhat out-of-order insertions?

score 2 · Answer 3 · answered Jan 22 '16 at 22:59

Which one was built first? How did you copy the data to the other one? Have you ever done OPTIMIZE TABLE or ALTER TABLE on one machine but not the other? ROW or STATEMENT based replication? Same indexes?

I ask about the order of the dual-master because I would sort of expect the Master to be smaller. RBR vs SBR impacts how the inserts/updates are performed on the Slave. Each secondary index is a BTree, separate from the BTree that contains both the Data and PRIMARY KEY.

Anyway, if you are using InnoDB, I would not worry until the difference is more than 40%, not a mere 4%. A BTree that has been inserted into a lot will have its blocks only about 69% full -- this is due to block splits, etc.

MySQL Slave database size >> Master Database

3 Answers3