0

I have a script which starts a daemon process and then sleeps for 20 seconds. If I run the script on SLES11 SP1 or RHEL6 then after the script exits the process is still running.

If I run the script on SLES11 SP3 or RHEL6.3 then after the script exits the process is no longer running. The process continues to run for the entire 20 second sleep and is killed when the process exits.

The script is run via expect so the script's entire shell exits with the process. Obviously if this wasn't a daemon it was starting I wouldn't be surprised. Also, I suspect the problem isn't the OS version as much as it is the difference in the way we've setup the newer servers (no idea what those differences are though, the older servers were set up years ago).

During the 20 seconds the process runs if I do a ps I get the following:

root      4699     1  0 15:14 pts/2    00:00:00 sudo -u openmq /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -D
openmq    4701  4699  0 15:14 pts/2    00:00:00 /bin/sh /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -Dimq.ssl
openmq    9095  9063 54 16:21 pts/2    00:00:02 /usr/java/latest/bin/java -cp /opt/PacketPortal/openmq/default/bin/../lib/imqbroker.jar:/opt/PacketPortal/openmq/default/bin/../lib/imqutil.jar:/opt/PacketPortal/ope

The fact that the parent process of 4699 is 1 seems to suggest to me that the process has been correctly daemonized. However, after the expect script exits both 4699 and 4701 are killed. What could be causing this?

UPDATE

I've printed the same output on the servers that work. During the 20 second sleep I get:

openmq   18652     1  0 15:44 pts/1    00:00:00 /bin/sh /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -Dimq.ssljms.tls.port=7680
openmq   18686 18652  8 15:44 pts/1    00:00:02 /usr/java/latest/bin/java -cp /opt/PacketPortal/openmq/default/bin/../lib/imqbroker.jar:/opt/PacketPortal/openmq/default/bin/../lib/imqutil.jar:/opt/PacketPortal/ope

After the 20 second sleep I get:

openmq   18652     1  0 15:44 ?        00:00:00 /bin/sh /opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -Dimq.ssljms.tls.port=7680
openmq   18686 18652  5 15:44 ?        00:00:02 /usr/java/latest/bin/java -cp /opt/PacketPortal/openmq/default/bin/../lib/imqbroker.jar:/opt/PacketPortal/openmq/default/bin/../lib/imqutil.jar:/opt/PacketPortal/ope

After the script exits it disconnects the controlling terminal. I wonder why it doesn't do that on the newer servers.

UPDATE

Here is the section of the script that actually launches OpenMQ. The -bgnd flag is what is supposed to daemonize it.

sudo -u openmq $IMQ_HOME/bin/$EXECUTABLE -bgnd $BROKER_OPTIONS $ARGS > /dev/null 2>&1 &

UPDATE

Some truly bizarre behavior I discovered by accident. If I change the command to:

sudo -u openmq sldkhglksj; $IMQ_HOME/bin/$EXECUTABLE -bgnd $BROKER_OPTIONS $ARGS > /dev/null 2>&1 &

Then I get sldkhglksj: command not found of course but...the openmq process is not killed. If I take that one change out, it is killed.

UPDATE

In retrospect, it appears that magical command changes the sudo to not run on the actual openmq startup which leads me to believe that sudo is somehow involve.

Pace
  • 235

4 Answers4

2

You might be running into this issue that is documented here: https://access.redhat.com/knowledge/solutions/180243.

It states that the sudo behavior for actions similar to the one you have described have changed in the version that ships with RHEL/CentOS 6.3 (sudo-1.7.4p5-11.el6.x86_64). The fact that you see different behavior between RHEL 6 and 6.3 and that this involves sudo is the reason why I am pointing this out.

Some options to try (I don't have a 100% answer, just throwing out ideas):

  • If you have root level access, which it looks like you do, try to run the script without using sudo, something like su -c '/opt/PacketPortal/openmq/default/bin/imqbrokerd -bgnd -autorestart -silent -port 7676 -Dimq.service.activelist=admin,ssljms -D' - openmq - See http://www.linfo.org/su.html for more info
  • Install an older version of sudo to work around this (hacky, I know, but you could build/install it in a temporary location to test it out)
  • Look into the huponexit shopt in the answer that Massimo references, that sounds promising if this isn't the sudo issue I mentioned above
1

You can prefix the command you wish to daemonize within the scripting:

  nohup command-that-you-want-to-demonize &

Then when the outside script completes the program will continue to run.

mdpc
  • 11,914
1

Try adding a disown on a line of its own after you background the process. This should prevent your shell from sending signals to any child processes as it exits.

chutz
  • 8,300
0

Try adding </dev/null to the startup command as well.

Not sure how exactly the -bgnd flag is supposed to background your process, but processes can die if their standard input gets lost, which is exactly what happens when you lose the ssh connection. You are already throwing away all output to the bitbucket, you may want to make sure there is no input either.

I cannot help explaining the change of behavior, but my suggestion is to just live with it.

chutz
  • 8,300