2

Incrontab is set up to monitor approx. 10 directories. The only thing it does is that it starts a Bash script when the new file is received inside one of these directories. Approximately, one file is received every 5 minutes, in each of the dirs. However, incrond occasionally stops. There is no rule when it will happen. It varies from few times per week to few times per month. Error which is logged is:

incrond[35203]: *** unhandled exception occurred ***
incrond[35203]:   polling failed
incrond[35203]:   error: (11) Resource temporarily unavailable
incrond[35203]: stopping service

I am aware I have not posted a lot of information. However, the system is closed so I have shared what I could. I am not looking for the direct answer (since the question might be too broad). I am looking for the ideas I may research. What could be the reason for such behavior? What things I should check? Which resources should I check?

2 Answers2

1

incrond uses the kernel-level inotify subsystem, incapsulating inotify C-based interface in a C++ container. Giving a look at incrond source files, it seems that the error you are facing is related to a failed polling on the file descriptor incapulated in incrond C++ class:

int res = poll(ed.GetPollData(), ed.GetSize(), -1);

if (res > 0) { ed.ProcessEvents(); } else if (res < 0) { switch (errno) { case EINTR: // syscall interrupted - continue polling break; case EAGAIN: // not enough resources - wait a moment and try again syslog(LOG_WARNING, "polling failed due to resource shortage, retrying later..."); sleep(POLL_EAGAIN_WAIT); break; default: throw InotifyException("polling failed", errno, NULL); } }

It is difficult to identify the exact cause for the failed polling. The most common causes can be:

  • an overloaded system
  • a crash/segfault of some incrond functions

Anyway, how many files exist under your monitored directories?

shodanshok
  • 52,255
0

Use strace on the command, logging to a file, and set the logging file to rotate depending on how frequently you notice the failure has occurred.

eg, if it takes you a week to find that its failed, your log rotation has to be kept for 7 days (or more). If you're generally aware within an hour, then 6 to 10 hours of rotated hourly logs should be sufficient.

More about it and examples: http://www.thegeekstuff.com/2011/11/strace-examples

James
  • 7,809
TG2
  • 101