6

There is this popular job interview question:

Given a machine hangup (let's say RHEL) how do you trouble shoot the problem?

My answer would be:

1) I'd use (what is the name of that server BIOS which allows you to connect to its console?) or go down to the server room and connect a monitor and keyboard to it and login as root.

2) Then I'd run "top" to see if some process has a very high CPU usage

3) Then I'd check memory (by "top" again?) and the total number of processes ("ps uawx") and the system limit (how, would "limit" give me the correct number)?

And then I don't know. Maybe run "vm"? But what would it tell me?

Please give me few good advices and impressive sentences for the recruiter.

1 Answers1

12

You can

  • check /var/log/messages for hints,
  • analyze sar -A output,
  • take a look at vmstat,
  • iotop,
  • dstat

For really bad lockups, you also have the Magic SysRq key to squeeze some info from the system.

Other places to look is the CMDB, see if there are any previous problems logged with the server and if there is an accepted workaround and or planned problem fix. You can even ask coworkers. There is more to a job than just technical prowess.

Sgaduuw
  • 1,843