"Replication Oplog Window has gone below 1 hours"

Question

I'm working with MongoDB Atlas and I have a 3 node M30 cluster with 100GB of storage.

My current use case is the following:

A user dispatches one search in my platform
The platform dispatches this search to other providers (12)
I get like 2k documents per provider for this search

I want to make sure I understand the "Replication Oplog Window has gone below 1 hours" alert I'm seeing. If I've got this correct, the OpLog Window is how long the master node can continue receiving data before the slave nodes get out of sync.

How can I tune the setting for the size of the OpLog Window?

score 15 · Accepted Answer · edited Jan 24 '25 at 16:37

From the MongoDB Atlas Docs, the OpLog is defined as:

A capped collection that stores an ordered history of logical writes to a MongoDB database. The oplog is the basic mechanism enabling replication in MongoDB.

The OpLog window is defined as:

oplog entries are time-stamped. The oplog window is the time difference between the newest and the oldest timestamps in the oplog. If a secondary node loses connection with the primary, it can only use replication to sync up again if the connection is restored within the oplog window.

If the replication OpLog window goes under one hour, it means the difference between the first and last timestamps in the OpLog is less than one hour.

This can happen when there are lots of changes in the database over a short period of time and the size of OpLog is "too small" to contain all the changes.

Note, this is not fatal. It just means that your secondary nodes cannot fall behind more than the maximum OpLog window. For example, if the OpLog window is 55 minutes, a secondary node that has been stopped for over 55 minutes cannot catch-up to the primary by reading the OpLog and applying the changes contained within. In order for the secondary to catch up, it would need a "full sync" snapshot of the primary to be restored to the secondary - if that full sync/snapshot process takes more than 55 minutes to complete, the secondary will not be able to catch up to the primary since data from immediately after the sync/snapshot will no longer be in the OpLog.

You should likely increase the size of the OpLog to ensure it can contain the transactions necessary to resync the secondary. If you double the size of the OpLog, the minimum OpLog window is typically double. The size of the OpLog window in units of time depends on how much data is written to the primary over any given period of time. For example, if the primary is writing 10GB of data per hour, and the OpLog maximum size is 10GB, you can only recover from a maximum of 1 hour of transactions.

Change the size of the OpLog using the MongoDB CLI, per these directions.

"Replication Oplog Window has gone below 1 hours"

1 Answers1