2

I'm trying to figure out why the reporting of slave lag is different in SHOW SLAVE STATUS and SHOW PROCESSLIST, in MySQL 5.5.13. The only difference to other slaves of this master is that it is replicating over a relatively slow connection, about 10Mb/sec (cross site).

SHOW SLAVE STATUS reports a slave lag of 0, or occasionally, the real slave lag, once every 5-10 requests. (I'm looking at Seconds_Behind_Master)

SHOW PROCESSLIST, shows the slave lag under the system_user's time column (the one that belongs to the replication sql thread), like so:

mysql> show processlist \G
*************************** 1. row ***************************
     Id: 1
   User: system user
   Host: 
     db: NULL
Command: Connect
   Time: 63953
  State: Waiting for master to send event
   Info: NULL
*************************** 2. row ***************************
     Id: 2
   User: system user
   Host: 
     db: NULL
Command: Connect
   Time: 61077
  State: Slave has read all relay log; waiting for the slave I/O thread to update it
   Info: NULL

Why would these differ? Or rather, why would SHOW SLAVE STATUS lie? The monitoring system is looking at this command, and goes crazy due to "spikes" once in a while. I know for a fact the right number is what SHOW PROCESSLIST reports, since the slave took a good few days to be initialized, and is catching up slowly.

shlomoid
  • 315
  • 4
  • 11

3 Answers3

2

SHOW PROCESSLIST, shows the slave lag under the system_user's time column (the one that belongs to the replication sql thread)

No. It's the time that the Slave sql thread has been in its current state.

quanta
  • 1,006
  • 9
  • 22
  • 36
1

The SQL thread has a time of 61077 seconds. This is actaully the length of time the SQL thread has been alive. The I/O Thread has been alive for 63953 seconds.

I recently wrote an explanation as to how replication calculates Seconds_Behind_Master.

The reason they could differ? Here is one possiblility:

If you encountered a SQL error in the SQL Thread, the I/O thread would still continue collecting SQL statements from the Master. Once you skip the error, the SQL thread's life starts from zero and increases from there. Something like this would have happened 2876 (63953 - 61077) Seconds after starting replication.

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
1

The "Time" in the SQL thread is (I think) identical to Seconds_behind_master. It is "How long ago did this query start on the Master ".

All other Times are indicate when the query started on the Slave.

Some fluctuation is caused by what it is measuring (the Master's start time).

Sometimes (rarely), I see the value (both places) bouncing between 0 and some large value. I have yet to track this down. I have seen it on 4.0, 4.1, and 5.1. It eventually goes away, and becomes civilized.

There may be cases where no traffic leads to strange values. But I don't have any Master-Slave setups with little enough traffic for me to comment.

Suppose you do ALTER on the Master, and it took 1 hour (3600 seconds). Also, suppose not much else is going on. The ALTER replicates and starts running. Immediately, the Seconds_behind_master will be about 3600. After the ALTER finishes on the Slave (say, 3600 more seconds later), subsequent replication items will execute with (probably) smaller Times. Eventually replication catches up.

Rick James
  • 80,479
  • 5
  • 52
  • 119