4

I'm asking this question, because I couldn't find the answer here :
Why is my crontab not working, and how can I troubleshoot it?

Context

We have several servers running debian/wheezy.

One backup task requires that we deactivate the crontab of a specific user during the backup, so we have a script, run daily, which roughly does :

# user is legec :

# save the crontab to a file
crontab -ulegec -l > /home/legec/.backup/crontab
# empty the crontab
echo "" | crontab -ulegec

backup ...

# reload crontab
cat /home/legec/.backup/crontab | crontab -ulegec

And this works as we expect, the vast majority of times.

This task runs on ~80 servers ; depending on the server, the backup task will take from 1 minute up to 2 hours.

Bug

Once in a while, cron will not detect the last reload, and will not execute any of the jobs listed in the crontab.

The file in /var/spool/cron/crontabs/legec has the expected content, and modification date :

$ ls -lh /var/spool/cron/crontabs/legec
-rw------- 1 legec crontab 6.7K Sep 22 04:03 /var/spool/cron/crontabs/legec

but cron logs indicate that cron did not detect the last change :

$ cat /var/log/cron.log | grep -E "LIST|RELOAD|REPLACE"
...
# yesterday's backup : all went fine
Sep 21 04:00:06 lgserver crontab[6670]: (root) LIST (legec)
Sep 21 04:00:06 lgserver crontab[6671]: (root) LIST (legec)
Sep 21 04:00:06 lgserver crontab[6673]: (root) REPLACE (legec)
Sep 21 04:01:01 lgserver /usr/sbin/cron[2025]: (legec) RELOAD (crontabs/legec)
Sep 21 04:03:01 lgserver crontab[7071]: (root) REPLACE (legec)
Sep 21 04:03:01 lgserver /usr/sbin/cron[2025]: (legec) RELOAD (crontabs/legec)

# today's backup : no final RELOAD event
Sep 22 04:00:07 lgserver crontab[24163]: (root) LIST (legec)
Sep 22 04:00:07 lgserver crontab[24164]: (root) LIST (legec)
Sep 22 04:00:07 lgserver crontab[24166]: (root) REPLACE (legec)
Sep 22 04:01:01 lgserver /usr/sbin/cron[2025]: (legec) RELOAD (crontabs/legec)
Sep 22 04:03:01 lgserver crontab[24458]: (root) REPLACE (legec)
          # no RELOAD line here

"Once in a while" means : no regularity, we see this bug maybe once a month, on one random server out of the ~80 which are running.

Question

Does anyone have a lead on where to look ?

LeGEC
  • 185

1 Answers1

4

First of all, just to be on the safe side, I'd advise to use proper forms of dealing with crontab. Namely

crontab -u user -r

to delete his crontab, and

crontab -u user backed_up_crontab_file

to restore.

Secondly, your timings may be important. If the user's crontab runs rarely, maybe it misses to run 1 time after restore, because it would've fired a minute before it was actually restored.

chicks
  • 3,915
  • 10
  • 29
  • 37
Gnudiff
  • 533