MongoDB secondary crashes on initial sync because of too many journal files on RAID 10

Question

My secondary DB server went down, so I'm booting up a replacement secondary and trying to perform the initial sync. I've been following the tutorials and advice out there to use RAIS 10 on Amazon EBS

So I used 4x4GB EBS in a RAID 10 with the following setup (that was suggested by mongodb back then)

sudo lvcreate -l 90%vg -n data vg0
sudo lvcreate -l 5%vg -n log vg0
sudo lvcreate -l 5%vg -n journal vg0

Since my Primary's version starts getting old (v3.2), I'm at the same time trying to upgrade to 3.4 so I just booted a secondary on 3.4 (in case this might be relevant to the problem)

Problem is, during the initial sync, MongoDB populates too many journal files in /journal, a total of 4x100MB journal files are allocated

ec2-user@secondary$ ll /journal/
total 369105
drwx------ 2 root   root       12288 Apr  3 14:47 lost+found
-rw-r--r-- 1 mongod mongod 104644096 Apr  3 19:00 WiredTigerLog.0000000001
-rw-r--r-- 1 mongod mongod 104685568 Apr  3 19:00 WiredTigerLog.0000000002
-rw-r--r-- 1 mongod mongod 104857600 Apr  3 19:00 WiredTigerLog.0000000003
-rw-r--r-- 1 mongod mongod 104857600 Apr  3 19:00 WiredTigerLog.0000000004

-rw-r--r-- 1 mongod mongod 0 Apr 3 19:00 WiredTigerTmplog.0000000005

which exceed the disk capacity allocated for journaling and causes a brutal crash during initial sync

2018-04-03T19:00:18.821+0000 E STORAGE  [thread2] WiredTiger error (28) [1522782018:821142][6176:0x7efc0cd3d700], log-server: /data/journal/WiredTigerTmplog.0000000005: handle-write: pwrite: failed to write 128 bytes at offset 0: No space left on device
2018-04-03T19:00:18.821+0000 E STORAGE  [thread2] WiredTiger error (28) [1522782018:821213][6176:0x7efc0cd3d700], log-server: journal/WiredTigerTmplog.0000000005: fatal log failure: No space left on device
2018-04-03T19:00:18.821+0000 E STORAGE  [thread2] WiredTiger error (-31804) [1522782018:821228][6176:0x7efc0cd3d700], log-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
2018-04-03T19:00:18.821+0000 I -        [InitialSyncInserters-my_job_glasses_production.ahoy_events0] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 64
2018-04-03T19:00:18.821+0000 I -        [InitialSyncInserters-my_job_glasses_production.ahoy_events0]

***aborting after fassert() failure


2018-04-03T19:00:18.821+0000 I -        [thread2] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 365
2018-04-03T19:00:18.821+0000 I -        [thread2]

***aborting after fassert() failure

I'm not really sure WHY this happens, since on my primary, I only have 2 journal files of 100MB each so I was guessing everything should have been okay

ec2-user@primary$ ll /data/journal/ -h
total 205M
-rw-r--r-- 1 mongod mongod 4.1M Apr  3 18:49 WiredTigerLog.0000000059
-rw-r--r-- 1 mongod mongod 100M Apr  3 16:43 WiredTigerPreplog.0000000001
-rw-r--r-- 1 mongod mongod 100M Apr  3 16:43 WiredTigerPreplog.0000000002

Did I miss something or is something wrong ? Here is my mongod.conf

systemLog:
  destination: file
  logAppend: true
  path: /log/mongod.log
  logRotate: reopen

storage:
  dbPath: /data
  journal:
    enabled: true

processManagement:
  fork: true  # fork and run in background
  pidFilePath: /var/run/mongodb/mongod.pid

net:
  port: 27017
  #bindIp added accordingly

security:
  authorization: enabled
  keyFile: /xxx.key

replication:
  replSetName: XXX

EDIT: It would seem during the initial sync, MongoDB creates up to a dozen files each 100MB before going back to 4x100MB files. Where is this documented ?? is there a way to put a limit on this ??

MongoDB secondary crashes on initial sync because of too many journal files on RAID 10

0 Answers0