Why does MySQL replication break on missing row, then continue after "START SLAVE"?

Question

The title is a bit confusing, but I can't think of a better one.

What I have is a simple vanilla MySQL replication, with the slave occasionally failing, with this error: Error 'Can't find record in 'my_tbl'' on query. Default database: 'my_db'. Query: 'UPDATE my_tbl SET ... WHERE ...' (columns omitted for clarity).

What I'm assuming this error means, is that the slave sql thread executed this update, and received 0 rows affected. This was not what it expected when comparing the result of 1 rows affected from the relay log, thus generating an error.

When running this same update transaction manually, it works. Same thing when running START SLAVE - it just starts working, and goes back to normal.

This doesn't make sense to me at all - if all it takes is a "retry" to fix this, how could this happen in the first place? Everything is executed in a serialized fashion, and nothing else is writing to the slave mysql server.

Can someone provide an explanation?

Some technicalities - this is a mixed replication setup from 5.5.7-rc to 5.5.12.

score 2 · Answer 1 · answered May 31 '11 at 10:35

There is a filed MySQL bug #60091 regarding the replication of InnoDB tables that may meet your conditions - take a look at it, check if your version is affected and update it eventually to check if it helps matters.

Another explanation for this would be out-of-order execution - when the UPDATE my_tbl SET ... WHERE ... is run, the WHERE condition can not yet be met by any row since it has still to happen. I can't think of a reason for that though - this would be something to ask about on MySQL mailing lists.

score 2 · Answer 2 · answered Jun 29 '11 at 13:33

2

I've discovered the reason behind this problem - an event which was running on the master and on the slave as well. The solution is simple - alter event event_name disable on slave; Something to keep in mind when creating a slave with mysqldump.

answered Jun 29 '11 at 13:33

shlomoid

289

Why does MySQL replication break on missing row, then continue after "START SLAVE"?

2 Answers2