4

There are quite number of software running on my server: httpd, varnish, mysql, memcache, java..

Each of them is using a part of the virtual memory and varnish was configured to be allocated 3GB of memory to run.

Due to high traffic load which is 100K, our server ran out of memory and oom-killer is invoked. We've to reboot the server.

We have 8GB of Virtual Memory and due to some reason we cannot extend to larger memory.

My question is - Is there any automated script, which will monitor how much virtual memory left and based upon certain criteria, lets say if 500MB left than restart the server automatically?

I do know this is not the proper solution but we have to do it, otherwise we don't know when server will get OOM and by the time we know and restart the server, we lost our visiting users.

2 Answers2

8

If I understand you correctly, you want something like the following:

  1. Check how much memory is left on the VPS.
  2. If 500M memory is left , reboot the VPS.

This could be done as follow

  1. Write a script that checks how much memory is left and reboot the VPS
  2. Add this script to crontab to automate the task.

e.g

#!/bin/bash

mem=$(free -m | awk '/Mem:/{print $4}')

(( mem <= 500 )) && reboot

Make the script executable

chmod +x scriptname // note don't add an extension

Add the script to the cron

crontab -e

* * * * * user_to_run_the_script /path/to/the/script

Hope you get the idea.

2

I had a similar problem, and while not wanting to question your question which is nice and specific, I have to say that you need a long term fix.

OOM killer kicks in because your server is running out of memory. Turning off OOM killer won't help with that - you'll still be out of memory and your server will eventually crash. Sure OOM killer doesn't always help, but turning it off won't either.

Rebooting your server will temporarily fix the problem, but it will happen again.

I had a similar issue with a server. Installing monit and configuring it to warn me when memory was running out allowed me to access the server when something was starting to happen, so I could properly diagnose it and solve it. I also added swap via a swapfile to increase the time I had to access the server while the problem was occurring.

In my case, my webserver was configured to start way too many spare servers for the load the server could handle. Once I figured out the root cause of the problem, I tackled it, and the server has not crashed since.

dunxd
  • 9,874