I'm currently struggling with an extremely slow server providing home directories through NFS, and I cannot figure out the origin of the problem. It's all running on Nixos, but I don't think that's important.
Here is the setup : I have a Dell server with a H330 raid card, and 3 hard-drives : one for the system (/sda), and two in RAID 1 for users accounts (/sdc). It is only used for serving home directories. The partition /sdc1 of the RAID disk is mounted on /home/lan, on /sda1 and is exported through NFS. There is also a swap partition on /sda2.
The /etc/fstab looks like:
/dev/sda1 / ext4 x-initrd.mount 0 1
/dev/sdc1 /home/lan ext4 defaults 0 2
And the nfs export is
/home/lan 192.168.1.0/24(insecure,rw,async,no_subtree_check)
I also have up to 50 clients connecting to this server for student machines. The clients mount /home/lan and login is done using LDAP.
It seems that the NFS part is fine, for instance, running nfsstat on the server :
Server rpc stats:
calls badcalls badfmt badauth badclnt
19836893 0 0 0 0
I think that the problem comes from the disk configuration. In particular, the server is spending most is time writing on /sda and not /sdc (here is the iostat output)
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
sda 10,10 0,79 73,48 0,00 2004798 185360634 0
sdc 0,50 0,08 8,56 0,00 203605 21587308 0
And btop confirms that /sda is used much more than /sdc. But, I repeat, there is almost with running on this server, it is only devoted to serving /home/lan so basically, most of the read/write should regard /sdc and not /sda.
Does anyone have an idea of what goes wrong and how I can fix that?
PS: iostat -x 3 3 yields
Linux 6.1.31 (scribe) 26/09/2023 _x86_64_ (6 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0,05 0,00 0,05 0,78 0,00 99,12
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sda 0,08 0,83 0,03 23,78 4,61 10,08 10,15 74,73 5,57 35,45 6,32 7,36 0,00 0,00 0,00 0,00 0,00 0,00 1,77 20,69 0,10 4,58
sdb 0,01 0,10 0,00 0,00 0,28 15,34 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdc 0,00 0,11 0,00 4,00 42,29 38,16 0,55 9,27 0,82 60,03 1,62 16,93 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,03
avg-cpu: %user %nice %system %iowait %steal %idle
0,06 0,00 0,17 2,33 0,00 97,45
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sda 0,00 0,00 0,00 0,00 0,00 0,00 17,33 158,67 18,67 51,85 8,06 9,15 0,00 0,00 0,00 0,00 0,00 0,00 6,67 18,30 0,26 14,17
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdc 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
avg-cpu: %user %nice %system %iowait %steal %idle
0,11 0,00 0,17 4,28 0,00 95,44
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sda 0,00 0,00 0,00 0,00 0,00 0,00 26,67 256,00 30,67 53,49 9,75 9,60 0,00 0,00 0,00 0,00 0,00 0,00 10,00 23,07 0,49 26,40
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdc 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
Following the comments, I have tried several tools such as iotop (very useful!) and reading the output of journalctl. I have found a first problem, excessive logging from OpenLDAP. I have removed that as this part works correctly. I'm now left with...
sept. 29 11:44:43 scribe rpc.mountd[729895]: v4.2 client detached: 0x96b3573a65167ddb from "192.168.1.197:1005"
sept. 29 11:44:43 scribe rpc.mountd[725638]: v4.2 client detached: 0x96b3573a65167ddb from "192.168.1.197:1005"
sept. 29 11:44:43 scribe rpc.mountd[729900]: v4.2 client detached: 0x96b3573a65167ddb from "192.168.1.197:1005"
sept. 29 11:44:44 scribe rpc.mountd[729895]: v4.2 client attached: 0x96b3573c65167ddb from "192.168.1.197:1005"
sept. 29 11:44:44 scribe rpc.mountd[725638]: v4.2 client attached: 0x96b3573c65167ddb from "192.168.1.197:1005"
sept. 29 11:44:44 scribe rpc.mountd[729900]: v4.2 client attached: 0x96b3573c65167ddb from "192.168.1.197:1005"
sept. 29 11:44:45 scribe rpc.mountd[729895]: v4.2 client detached: 0x96b3573c65167ddb from "192.168.1.197:1005"
sept. 29 11:44:45 scribe rpc.mountd[729900]: v4.2 client detached: 0x96b3573c65167ddb from "192.168.1.197:1005"
sept. 29 11:44:45 scribe rpc.mountd[725638]: v4.2 client detached: 0x96b3573c65167ddb from "192.168.1.197:1005"
sept. 29 11:45:00 scribe rpc.mountd[729895]: v4.2 client attached: 0x96b3575765167ddb from "192.168.1.197:1005"
sept. 29 11:45:00 scribe rpc.mountd[729900]: v4.2 client attached: 0x96b3575765167ddb from "192.168.1.197:1005"
sept. 29 11:45:00 scribe rpc.mountd[725638]: v4.2 client attached: 0x96b3575765167ddb from "192.168.1.197:1005"
sept. 29 11:45:00 scribe rpc.mountd[729895]: v4.2 client detached: 0x96b3575765167ddb from "192.168.1.197:1005"
sept. 29 11:45:00 scribe rpc.mountd[729900]: v4.2 client detached: 0x96b3575765167ddb from "192.168.1.197:1005"
sept. 29 11:45:00 scribe rpc.mountd[725638]: v4.2 client detached: 0x96b3575765167ddb from "192.168.1.197:1005"
sept. 29 11:45:11 scribe rpc.mountd[729895]: v4.2 client attached: 0x96b3577265167ddb from "192.168.1.197:1005"
sept. 29 11:45:11 scribe rpc.mountd[725638]: v4.2 client attached: 0x96b3577265167ddb from "192.168.1.197:1005"
sept. 29 11:45:11 scribe rpc.mountd[729900]: v4.2 client attached: 0x96b3577265167ddb from "192.168.1.197:1005"
sept. 29 11:45:11 scribe rpc.mountd[729895]: v4.2 client attached: 0x96b3577465167ddb from "192.168.1.197:1005"
sept. 29 11:45:11 scribe rpc.mountd[729900]: v4.2 client attached: 0x96b3577465167ddb from "192.168.1.197:1005"
sept. 29 11:45:11 scribe rpc.mountd[729900]: v4.2 client detached: 0x96b3577265167ddb from "192.168.1.197:1005"
sept. 29 11:45:11 scribe rpc.mountd[729895]: v4.2 client detached: 0x96b3577265167ddb from "192.168.1.197:1005"
sept. 29 11:45:11 scribe rpc.mountd[725638]: v4.2 client detached: 0x96b3577265167ddb from "192.168.1.197:1005"
This is the entries of journalctl for a single client. And I have between 20 and 50 of them! But I don't know how to find why there are so many attaching and detaching, and even if it is normal.