29

I am running Bacula on a RedHat box. From time to time, the storage daemon bacula-sd stops working and becomes <defunct>.

[root@backup ~]# ps -ef | grep defunct | more
root      4801 29261  0 09:25 pts/5    00:00:00 grep defunct
root      5825     1  0 Oct18 ?        00:00:00 [bacula-sd] <defunct>

My question is, how can I kill this process? Its parent is 1, which is init, as far as I know, and I wouldn't want to kill the init process, would I?

'Normally' killing this process does not work:

[root@backup ~]# kill -0 5825
[root@backup ~]# kill -9 5825

Help is greatly appreciated!

Edit: running

[root@backup ~]# lsof -p 5825

produces the following output:

COMMAND    PID USER   FD   TYPE  DEVICE     SIZE    NODE NAME
bacula-sd 5825 root  cwd    DIR   253,0     4096 3801089 /root
bacula-sd 5825 root  rtd    DIR   253,0     4096       2 /
bacula-sd 5825 root  txt    REG   253,0  2110599  368004 /usr/local/sbin/bacula-sd
bacula-sd 5825 root  mem    REG   253,0    75284  389867 /usr/lib/libz.so.1.2.3
bacula-sd 5825 root  mem    REG   253,0    46680 3604521 /lib/libnss_files-2.5.so
bacula-sd 5825 root  mem    REG   253,0   936908  369115 /usr/lib/libstdc++.so.6.0.8
bacula-sd 5825 root  mem    REG   253,0   125736 3606807 /lib/ld-2.5.so
bacula-sd 5825 root  mem    REG   253,0  1602128 3606885 /lib/libc-2.5.so
bacula-sd 5825 root  mem    REG   253,0   208352 3606892 /lib/libm-2.5.so
bacula-sd 5825 root  mem    REG   253,0   125744 3606887 /lib/libpthread-2.5.so
bacula-sd 5825 root  mem    REG   253,0    25940 3604573 /lib/libacl.so.1.1.0
bacula-sd 5825 root  mem    REG   253,0    15972 3604535 /lib/libattr.so.1.1.0
bacula-sd 5825 root  mem    REG   253,0    46548 3606908 /lib/libgcc_s-4.1.2-20080102.so.1
bacula-sd 5825 root  mem    REG   253,0 56422480  366368 /usr/lib/locale/locale-archive
bacula-sd 5825 root    0r   CHR     1,3             1545 /dev/null
bacula-sd 5825 root    1r   CHR     1,3             1545 /dev/null
bacula-sd 5825 root    2r   CHR     1,3             1545 /dev/null
bacula-sd 5825 root    3u   CHR   9,128             6469 /dev/nst0
bacula-sd 5825 root    4u  IPv4 1023380              TCP backup:bacula-sd (LISTEN)
bacula-sd 5825 root    5u  IPv4 2693268              TCP backup:bacula-sd->backup:53957 (CLOSE_WAIT)
bacula-sd 5825 root    7u  IPv4 3248683              TCP backup:bacula-sd->backup:57629 (CLOSE_WAIT)
bacula-sd 5825 root    8u  IPv4 3250966              TCP backup:bacula-sd->backup:37650 (CLOSE_WAIT)
bacula-sd 5825 root    9u  IPv4 3253908              TCP backup:bacula-sd->backup:37671 (CLOSE_WAIT)
andreas-h
  • 1,174

7 Answers7

25

The only way you could remove the zombie/defunct process, would be to kill the parent. Since the parent is init (pid 1), that would also take down your system.

This pretty much leaves you with two options.

  • Manually modify the process table, eg. create a dummy process, link the defunct process as a child of the dummy, then kill them off. Quite dangerous, and you may have to manually clean up other process resources such as semaphores and file handles.
  • Reboot the system.

I'd go with the second.

Roy
  • 4,596
6

Check if there was a kernel panic,

# dmesg |tail

Check if the process is in "D" Unkillable sleep, where it's in kernel mode for some syscall which has not returned yet (either kernel oops, or some other reason) http://www.nabble.com/What-causes-an-unkillable-process--td20645581.html

4

You could try restarting init:

 # telinit u

Otherwise, I wouldn't worry too much. It's not running and it's not taking any resources and it's just there so the kernel can remember it.

David Pashley
  • 23,963
4

If a zombie has init as its parent, then init has stopped working properly. One of the roles of init is to clean up zombies. If it doesn't do it, noone else will. So the only solution is to reboot. If init is broken, then a reboot may fail, so I'd shut down important services, sync the filesystem then hit the power button instead.

MarkR
  • 2,928
2

Let's keep the panic down, shall we? A "defunct" or "zombie" process is not a process. It is simply an entry in the process table, with a saved exit code. Thus, a zombie holds no resources, takes no CPU cycles, and uses no memory, since it is not a process. Don't get all weird and itchy trying to "kill" zombie processes. Just like their namesakes, they can't be killed, since they're already dead. But unlike the brain-eating kind, they harm absolutely no-one, and won't bite other processes.

Don't let zombie processes eat your brain. Just ignore them.

Teddy
  • 5,424
0

I just had this issue, where I'm running wine Kindle, and the Kindle window won't close after I kill all wine processes, if I run ps, there is a [Kindle.exe] <defunct> process whose parent is 1 (ps.tree is a self-made script to show process tree):

$ps.tree 21323
 1 0 02:44:47 /lib/systemd/systemd --system --deserialize 119
   21323 1 01:50:44 [Kindle.exe] <defunct>

I finally killed the [Kindle.exe] process and the ghost window by killing all threads of this process, by running this command:

cd /proc/21323/task
kill *
0

Seems like you've got an orphaned process. As far as I know the only way to kill these would be to reboot the box. I've had this happen on my ESX servers (which are linux under the hood) from time to time and a host reboot is the fix (from VMware support).

I'm a Windows guy so take that for what its worth.

mrdenny
  • 27,212