Client machines were able to connect to our NFS server earlier this afternoon, and everything was running fine. The setup has been working fine for several years. No configuration changes were made on the server.
The NFS server hung with a "too many open files" error, and unable to ssh into it, we shut it down via ACPI. After the NFS server was restarted, all attempts by clients to connect to it hang forever.
Steps taken so far:
Verify the NFS daemon is running
service nfs-kernel-server status
nfsd running
Restart NFS daemon. This is where I ran into something bizarre
When I run:
service nfs-kernel-server stop
It says:
* Stopping NFS kernel daemon [ OK ]
* Unexporting directories for NFS kernel daemon... [ OK ]
Then I run:
service nfs-kernel-server status
and it says:
nfsd running
So no idea if it is actually stopping the service or not, since it claims to stop, but then says its still running anyway. Also, running stop multiple times does not produce an error- it just says it Stopping NFS kernel daemon each time I run the stop command.
When it is supposedly stopped, ps aux | grep nfsd shows:
root 761 0.0 0.0 0 0 ? S< Apr04 0:00 [nfsd4]
root 762 0.0 0.0 0 0 ? S< Apr04 0:00 [nfsd4_callbacks]
root 763 0.0 0.0 0 0 ? D Apr04 0:00 [nfsd]
root 764 0.0 0.0 0 0 ? D Apr04 0:00 [nfsd]
root 765 0.0 0.0 0 0 ? D Apr04 0:00 [nfsd]
root 766 0.0 0.0 0 0 ? D Apr04 0:00 [nfsd]
root 767 0.0 0.0 0 0 ? D Apr04 0:00 [nfsd]
root 768 0.0 0.0 0 0 ? D Apr04 0:00 [nfsd]
root 769 0.0 0.0 0 0 ? D Apr04 0:00 [nfsd]
So it appears that the stop command isn't actually stopping the process.
Reboot NFS Server again
Failing that, we rebooted the NFS server using reboot. We get the same problem every time we reboot, mount attempts still timeout, and NFS appears to be keep running even when we try to stop it.
Verify portmap is running
root@nfs:~# service portmap status
portmap start/running, process 540
Stopand restart portmap and NFS
I went through the motions of:
service nfs-kernel-server stop
service portmap stop
service portmap start
service nfs-kernel-server start
But since the nfs-kernel-server service doesn't actually stop when you tell it to (see above), it didn't do anything other than restart portmap.