3

My mysql_slave got stuck on an error and built up to being 200,000+ Seconds_Behind_Master. Should I rebuild it and start from scratch, or let it catch up by itself?

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
vinhboy
  • 319
  • 2
  • 4
  • 11

4 Answers4

6

IMHO

Let it catch up.

The Seconds_Behind_Master reading is not what I rely on to see if catching up is possible or worth it. Don't be afraid at the number reported.

You have to look at the server's setting for expire_logs_days. If your slave is so far behind that the binary log file has been deleted, then you need to re-image.

randomx
  • 3,944
  • 4
  • 31
  • 44
5

Seconds_Behind_Master is really a double-edged sword

When Seconds_Behind_Master significantly increases, there are two scenarios to look into with regards to the output of SHOW SLAVE STATUS\G:

SCENARIO #1

If the Relay_Log_Space is under 1G, this is a tell-tale sign that the Slave has issues reading its Master's binlog entries through the IO Thread. The timestamps recorded is the relay logs would appear to be skewed because the Master may have recorded its binlog quickly, but gap between the Slave's current time and the timestamp in the its relay logs increase. Look for things like these:

  • Server load on the Master
  • Long running queries from the Master
  • A series of DML statements that ran fine in parallel on Master but then serialize on Slave
  • Network Latency over the Slave's IO Thread

SCENARIO #2

If the Relay_Log_Space starts exceeding 2G, immediately look at Slave_IO_Running and Slave_SQL_Running. Chances are, replication is simply broken. In most cases, if Replication breaks, the SQL Thread is dead and the IO Thread keeping running (Slave_IO_Running is Yes, Slave_SQL_Running is No), Because the IO Thread is still up, it can catch new SQL commands that the Master shipped over to the Slave's relay logs. That can grow. If left unchecked, it could grow to the point of filling up a disk. If your alerting scheme simply checks Seconds_Behind_Master being NULL, this should prompt you to quickly address the SQL error. If Seconds_Behind_Master is just numerically increasing, this will occur due to a series of DML statements that ran fine in parallel on Master but then serialize on Slave (just like in SCENARIO #1). In essence, SQL statements from the IO Thread are being collected faster than the SQL thread can process them.

EPILOGUE

  • SCENARIO #1 requires more troubleshooting and root cause analysis.
  • SCENARIO #2 is just an indication of a heavy-write Master sending its transactions to a Slave to process SQL one at time. In that case, you have one of three(3) choices
    • Let replication catchup
    • Shutdown mysql on the Slave, zap all data (except mysql folder) and do a fresh reload of MySQL
    • Shutdown mysql on the Slave, RESET MASTER; on the Master, rsync /var/lib/mysql from Master to Slave, start mysql on the Slave, setup replication
    • Use XtraBackup to perform live copy of the Master and restores to Slave

Here are my posts that may help with these steps

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
1

You should watch seconds behind master periodically, it should decrease it's value.

Also Make sure that Slave I/O state should be Waiting for Master to send event and Slave I/O running and Slave SQL running shows status as YES.

Mahesh Patil
  • 3,078
  • 2
  • 17
  • 23
0

Seconds_Behind_Master(SBM) will represent the difference between the timestamp of the latest transaction processed by the SQL Thread and the timestamp of the same transaction when it was processed on the master. If you see a 200,000+ SBM on the slave - it means that the transaction the slave executed now, was executed 200,000 seconds back on the master. However, it does not mean that the slave will need the same amount of time to catch up. It could be more or less depending the execution speed of the slave and actual number of transactions the master is ahead of. Check out this post:https://scalegrid.io/blog/mysql-tutorial-understanding-the-seconds-behind-master-value/ that explains how to understand various values of Seconds_Behind_Master