5

Wondering what I could adjust to get my RAID 6 Software Raid to resync quicker. Currently it's proceeding at max of 64MB/s and averages to something around 25MB/s. Hoping to get it to 200MB:

    [root@localhost mnt]# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4]
    md127 : active raid6 sdh[5] sdg[4] sdf[3] sde[2] sdd[1] sdc[0]
      8000935168 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      [====>................]  resync = 21.9% (439396416/2000233792) finish=1148.8min speed=22641K/sec
      bitmap: 12/15 pages [48KB], 65536KB chunk

I've checked with iostat to see what could be the bottleneck and see rrqm and wrqm hanging near 100 but never crossing it:

10/08/23 20:39:35
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.01    0.00    0.36    1.03    0.00   98.60

Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.40 26.80 0.00 0.00 1.75 67.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.08 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme0n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme1n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme2n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.40 26.80 0.00 0.00 2.25 67.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.08 sdc 25.40 17932.00 3895.60 99.35 411.14 705.98 49.00 5638.45 1334.20 96.46 16.54 115.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11.25 34.47 sdd 23.80 17932.00 3897.20 99.39 905.93 753.45 50.00 5824.05 1381.10 96.51 79.93 116.48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 25.56 44.14 sde 26.20 17932.00 3894.60 99.33 417.60 684.43 53.80 5632.05 1330.90 96.11 31.82 104.68 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.65 42.79 sdf 24.90 24414.80 3901.40 99.37 2737.51 980.51 61.10 7334.45 1382.70 95.77 2166.86 120.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 200.56 84.72 sdg 413.70 15671.20 3504.10 89.44 1.37 37.88 50.70 5580.85 1332.00 96.33 1.61 110.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.65 10.96 sdh 215.20 15671.20 3702.60 94.51 2.91 72.82 49.50 5772.85 1381.20 96.54 1.37 116.62 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.69 11.74

iostat -xtc -t 10

What I've set:

/sys/block/md127/md/stripe_cache_size to 32768

dev.raid.speed_limit_min = 500000 or 10000 dev.raid.speed_limit_max = 5000000 or 200000

Set /sys/block/sd{c,d,e,f,g,h}/queue/max_sectors_kb to 1024 Queue depth for each device (/sys/block/$diskdrive/device/queue_depth) is set to 10. Can't change it.

[root@localhost mnt]# sysctl -a |grep -Ei vm.dirty
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200
[root@localhost mnt]#

The definition for rrqm and wrqm is well documented:

rrqm/s
The number of read requests merged per second that were queued to the device.
wrqm/s
The number of write requests merged per second that were queued to the device.

%rrqm,%wrqm: The percentage of read/write requests merged at ioscheduler before sent to the device.

However, the definition is not quite matching the behaviour I'm seeing in iostat, where it's marked as red and reaching but not crossing 100%. Close to or at 100% typically indicates saturation of some kind to me.

The drives in this system are 6 x 2TB SSD's (Patriot). They are connected to the P440AR in this HP box. (Model: ATA Patriot P210 204)

So I'm curious what else I could try to increase the resync time?

Oct 9th

I'll just tackle each item accordingly, though that does sound quite like a ChatGPT answer. (Btw, I tried ChatGPT however most parameters didn't really work that it suggested after a 2 hour chat.) ;) . Current average speed is 9MB/s.

  1. CPU 2x Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz. Lot's of free and idle cores. Max I see is 3 cores near 100%, however most of the time iowait on 1-2 cores hanging < 75%.

  2. Tried. No effect. Reverted it. No discernible effect when set to bfq vs mq-deadline for the drives.

  3. Kernel 6.5.2 parmameters: GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/almalinux-swap rd.lvm.lv=almalinux/root rd.lvm.lv=almalinux/swap elevator=mq-deadline nvme.poll_queue=8"

  4. No filesystem yet. Just messing with the drives right now. Total test box, hence gives me lot's of options.

  5. 256GB

  6. AlmaLinux 9/2

  7. Lined it up on a 64K chunk size between the mdadm and disk definition

  8. See above.

  9. [root@localhost mnt]# cat /sys/block/md127/md/sync_force_parallel 1 https://unix.stackexchange.com/questions/734715/multiple-mdadm-raid-rebuild-in-parallel

  10. This is a net new array build.

Scheduler settings:

[root@localhost mnt]# cat /sys/block/*/queue/scheduler
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
[root@localhost mnt]# ls -altri /sys/block/*/queue/scheduler
62995 -rw-r--r--. 1 root root 4096 Oct  8 14:28 /sys/block/sdb/queue/scheduler
61092 -rw-r--r--. 1 root root 4096 Oct  8 14:28 /sys/block/nvme0n1/queue/scheduler
61775 -rw-r--r--. 1 root root 4096 Oct  8 14:28 /sys/block/nvme1n1/queue/scheduler
60866 -rw-r--r--. 1 root root 4096 Oct  8 14:28 /sys/block/nvme2n1/queue/scheduler
78483 -rw-r--r--. 1 root root 4096 Oct  8 22:11 /sys/block/sdd/queue/scheduler
78799 -rw-r--r--. 1 root root 4096 Oct  8 22:12 /sys/block/sde/queue/scheduler
79130 -rw-r--r--. 1 root root 4096 Oct  8 22:12 /sys/block/sdf/queue/scheduler
79461 -rw-r--r--. 1 root root 4096 Oct  8 22:12 /sys/block/sdg/queue/scheduler
79792 -rw-r--r--. 1 root root 4096 Oct  8 22:12 /sys/block/sdh/queue/scheduler
80123 -rw-r--r--. 1 root root 4096 Oct  8 22:12 /sys/block/sdi/queue/scheduler
[root@localhost mnt]# echo bfq > /sys/block/sdd/queue/scheduler
[root@localhost mnt]# echo bfq > /sys/block/sde/queue/scheduler
[root@localhost mnt]# echo bfq > /sys/block/sdf/queue/scheduler
[root@localhost mnt]# echo bfq > /sys/block/sdg/queue/scheduler
[root@localhost mnt]# echo bfq > /sys/block/sdh/queue/scheduler
[root@localhost mnt]# echo bfq > /sys/block/sdi/queue/scheduler
[root@localhost mnt]# cat /sys/block/*/queue/scheduler
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
none [mq-deadline] kyber bfq
none mq-deadline kyber [bfq]
none mq-deadline kyber [bfq]
none mq-deadline kyber [bfq]
none mq-deadline kyber [bfq]
none mq-deadline kyber [bfq]
none mq-deadline kyber [bfq]
[root@localhost mnt]#

Tried P440AR HD RAID 6 as suggested in the comments. Slower by 25-30%. Just finished testing it, blew it up and am recreating the software raid. HD RAID 6 was blocked on IO heavily. After filling the cache through some performance testing by copying a large file over and over to the array w/ different file names, hdparm -tT /dev/sdX would sit hung totally blocked for 30+ minutes after I cancelled the copy. An error when it was blocked was:

Oct  7 22:17:47 localhost kernel: INFO: task hdparm:9331 blocked for more than 1228 seconds.
Oct  7 22:17:47 localhost kernel:      Not tainted 6.5.2 #3
Oct  7 22:17:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  7 22:17:47 localhost kernel: task:hdparm          state:D stack:0     pid:9331  ppid:6254   flags:0x00004006
Oct  7 22:17:47 localhost kernel: Call Trace:
Oct  7 22:17:47 localhost kernel: <TASK>
Oct  7 22:17:47 localhost kernel: __schedule+0x211/0x660
Oct  7 22:17:47 localhost kernel: schedule+0x5a/0xd0
Oct  7 22:17:47 localhost kernel: wb_wait_for_completion+0x56/0x80
Oct  7 22:17:47 localhost kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Oct  7 22:17:47 localhost kernel: sync_inodes_sb+0xc0/0x100
Oct  7 22:17:47 localhost kernel: ? __pfx_sync_inodes_one_sb+0x10/0x10
Oct  7 22:17:47 localhost kernel: iterate_supers+0x88/0xf0
Oct  7 22:17:47 localhost kernel: ksys_sync+0x40/0xa0
Oct  7 22:17:47 localhost kernel: __do_sys_sync+0xa/0x20
Oct  7 22:17:47 localhost kernel: do_syscall_64+0x5c/0x90
Oct  7 22:17:47 localhost kernel: ? syscall_exit_work+0x103/0x130
Oct  7 22:17:47 localhost kernel: ? syscall_exit_to_user_mode+0x22/0x40
Oct  7 22:17:47 localhost kernel: ? do_syscall_64+0x69/0x90
Oct  7 22:17:47 localhost kernel: ? do_user_addr_fault+0x22b/0x660
Oct  7 22:17:47 localhost kernel: ? exc_page_fault+0x65/0x150
Oct  7 22:17:47 localhost kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Oct  7 22:17:47 localhost kernel: RIP: 0033:0x7f081163ed5b
Oct  7 22:17:47 localhost kernel: RSP: 002b:00007ffe797ccc28 EFLAGS: 00000217 ORIG_RAX: 00000000000000a2
Oct  7 22:17:47 localhost kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f081163ed5b
Oct  7 22:17:47 localhost kernel: RDX: 00007f0811600000 RSI: 0000000000200000 RDI: 0000000000000003
Oct  7 22:17:47 localhost kernel: RBP: 0000000000000003 R08: 00000000ffffffff R09: 0000000000000000
Oct  7 22:17:47 localhost kernel: R10: 0000000000000022 R11: 0000000000000217 R12: 00007f0811400000
Oct  7 22:17:47 localhost kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Oct  7 22:17:47 localhost kernel: </TASK>
Oct  7 22:17:47 localhost kernel: Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings

Also tried this as suggested in the comments:

"echo 8 > /sys/block/md127/md/group_thread_cnt"

but little to no discernible effect I could see. Those rsync speeds fluctuate rather wildly.

Some ssacli numbers. I've wrapped RAID 0 over the drives to make use of some of the P440AR capabilities instead of the HW RAID 6 via the P440AR.

=> ctrl slot=3 show

Smart Array P440 in Slot 3 Bus Interface: PCI Slot: 3 Serial Number: ABC12345678 Cache Serial Number: ABC12345678 RAID 6 (ADG) Status: Enabled Controller Status: OK Hardware Revision: B Firmware Version: 4.02-0 Firmware Supports Online Firmware Activation: False Rebuild Priority: High Expand Priority: Medium Surface Scan Delay: 3 secs Surface Scan Mode: Idle Parallel Surface Scan Supported: Yes Current Parallel Surface Scan Count: 1 Max Parallel Surface Scan Count: 16 Queue Depth: Automatic Monitor and Performance Delay: 60 min Elevator Sort: Enabled Degraded Performance Optimization: Disabled Inconsistency Repair Policy: Disabled Wait for Cache Room: Disabled Surface Analysis Inconsistency Notification: Disabled Post Prompt Timeout: 15 secs Cache Board Present: True Cache Status: OK Cache Ratio: 75% Read / 25% Write Drive Write Cache: Enabled Total Cache Size: 4.0 Total Cache Memory Available: 3.8 No-Battery Write Cache: Disabled SSD Caching RAID5 WriteBack Enabled: True SSD Caching Version: 2 Cache Backup Power Source: Batteries Battery/Capacitor Count: 1 Battery/Capacitor Status: OK SATA NCQ Supported: True Spare Activation Mode: Activate on physical drive failure (default) Controller Temperature (C): 58 Cache Module Temperature (C): 51 Number of Ports: 1 Internal only Encryption: Not Set Express Local Encryption: False Driver Name: hpsa Driver Version: 3.4.20 Driver Supports SSD Smart Path: True PCI Address (Domain:Bus:Device.Function): 0000:08:00.0 Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s) Controller Mode: RAID Pending Controller Mode: RAID Port Max Phy Rate Limiting Supported: False Latency Scheduler Setting: Disabled Current Power Mode: MaxPerformance Survival Mode: Enabled Host Serial Number: ABC12345678 Sanitize Erase Supported: True Primary Boot Volume: None Secondary Boot Volume: None

=>

One of the drives (example):

=> ctrl slot=3 ld 1 show

Smart Array P440 in Slot 3

Array A

  Logical Drive: 1
     Size: 1.86 TB
     Fault Tolerance: 0
     Heads: 255
     Sectors Per Track: 32
     Cylinders: 65535
     Strip Size: 64 KB
     Full Stripe Size: 64 KB
     Status: OK
     Caching:  Enabled
     Unique Identifier: UNIQUEIDENTIFIER
     Disk Name: /dev/sdd
     Mount Points: None
     Logical Drive Label: DRIVELABEL
     Drive Type: Data
     LD Acceleration Method: Controller Cache


=>

On a brief moment of performance increase, sometime early in the resync, it did reach this number below, which is more of what I would expect on such an array, but then dropped drastically:

[root@localhost mnt]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md127 : active raid6 sdi[5] sdh[4] sdg[3] sdf[2] sde[1] sdd[0] 
8000935168 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU] 
[=>...................] resync = 5.2% (105114044/2000233792) finish=121.2min speed=260430K/sec 
bitmap: 15/15 pages [60KB], 65536KB chunk 
....................
[root@localhost mnt]#

Not sure the impact, but one of the IO boards or the controller is 75C. Looks like this is the Smart Array P440 Controller based on dmidecode output.

My top output. Number of cores active in IO wait state is more now. Not sure if it is due to group_thread_cnt but if it is, it's a clue that something else is the bottleneck since now just more threads are waiting, but resync still hovers in the low speeds:

top - 13:04:07 up 22:35,  5 users,  load average: 9.26, 9.27, 9.21
Tasks: 717 total,   1 running, 716 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.1 sy,  0.0 ni, 99.0 id,  0.7 wa,  0.0 hi,  0.2 si,  0.0 st
%Cpu1  :  0.0 us,  2.1 sy,  0.0 ni,  0.7 id, 97.0 wa,  0.1 hi,  0.1 si,  0.0 st
%Cpu2  :  0.0 us,  1.4 sy,  0.0 ni,  1.0 id, 97.4 wa,  0.1 hi,  0.1 si,  0.0 st
%Cpu3  :  0.0 us,  0.1 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.7 sy,  0.0 ni,  0.9 id, 98.4 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.1 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.3 sy,  0.0 ni, 99.6 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  :  0.0 us,  0.1 sy,  0.0 ni, 98.6 id,  1.2 wa,  0.0 hi,  0.1 si,  0.0 st
%Cpu9  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 :  0.0 us,  0.6 sy,  0.0 ni,  3.3 id, 96.0 wa,  0.0 hi,  0.1 si,  0.0 st
%Cpu11 :  0.0 us,  0.2 sy,  0.0 ni,  0.7 id, 99.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  0.0 us,  0.3 sy,  0.0 ni, 99.6 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 :  0.0 us,  0.1 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 :  0.0 us,  0.7 sy,  0.0 ni, 97.9 id,  1.2 wa,  0.1 hi,  0.1 si,  0.0 st
%Cpu15 :  0.0 us,  0.1 sy,  0.0 ni, 99.8 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 :  0.0 us,  0.1 sy,  0.0 ni, 99.7 id,  0.1 wa,  0.0 hi,  0.1 si,  0.0 st
%Cpu17 :  0.0 us,  0.2 sy,  0.0 ni,  0.6 id, 99.1 wa,  0.0 hi,  0.1 si,  0.0 st
%Cpu18 :  0.1 us,  0.0 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu22 :  0.0 us,  0.1 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu23 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu24 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu25 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu26 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu27 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu28 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu29 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu30 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu31 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu32 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu33 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu34 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu35 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu36 :  0.0 us,  1.4 sy,  0.0 ni,  0.7 id, 97.9 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu37 :  0.0 us,  0.4 sy,  0.0 ni, 99.5 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu38 :  0.0 us,  0.2 sy,  0.0 ni,  0.7 id, 99.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu39 :  0.0 us,  0.4 sy,  0.0 ni, 99.4 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu40 :  0.0 us,  0.4 sy,  0.0 ni, 99.5 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu41 :  0.0 us,  0.2 sy,  0.0 ni, 99.7 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu42 :  0.0 us,  0.4 sy,  0.0 ni, 99.5 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu43 :  0.0 us,  0.3 sy,  0.0 ni, 99.6 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu44 :  0.0 us,  0.3 sy,  0.0 ni, 99.6 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu45 :  0.0 us,  0.1 sy,  0.0 ni, 99.8 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu46 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu47 :  0.0 us,  0.1 sy,  0.0 ni, 99.8 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu48 :  0.0 us,  0.1 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu49 :  0.0 us,  0.1 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu50 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu51 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu52 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu53 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu54 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu55 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu56 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu57 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu58 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu59 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu60 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu61 :  0.0 us,  0.1 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu62 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu63 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu64 :  0.1 us,  0.1 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu65 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu66 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu67 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu68 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu69 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu70 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu71 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257264.0 total, 253288.3 free,   4062.6 used,   1491.2 buff/cache
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 253201.3 avail Mem

My next exercise will be to totally avoid any RAID 6 or RAID 0 setup for each individual drive, via the controller. Just let it present the drives as-is.

Oct 9th - Update 2

My other array made up of NVMe drives also seems impacted. Both arrays should be clocking near 2GB/s when resync is done but (md127 = SATA SSD array above. md0 = NVMe mdadm software raid 6. ):

[root@localhost queue]# hdparm -tT /dev/md127

/dev/md127: Timing cached reads: 2 MB in 4.54 seconds = 451.26 kB/sec Timing buffered disk reads: 2 MB in 3.50 seconds = 585.55 kB/sec [root@localhost queue]# [root@localhost queue]# [root@localhost queue]# [root@localhost queue]# hdparm -tT /dev/md127

/dev/md127: Timing cached reads: 2 MB in 8.77 seconds = 233.61 kB/sec Timing buffered disk reads: 2 MB in 5.87 seconds = 349.05 kB/sec [root@localhost queue]# [root@localhost queue]# [root@localhost queue]# hdparm -tT /dev/md0

/dev/md0: Timing cached reads: 17256 MB in 1.99 seconds = 8667.23 MB/sec Timing buffered disk reads: 48 MB in 3.17 seconds = 15.13 MB/sec [root@localhost queue]#

Without md0 (NVMe) RAID 6 array:

https://tinyurl.com/bp737ffy

Cheers,

1 Answers1

0

Since you're using software RAID, optimizing the resync speed may involve different considerations. Here are some additional steps to improve software RAID resync speed:

  1. CPU Performance: Software RAID heavily relies on CPU power. Ensure your CPU is not bottlenecking the process. Monitor CPU utilization during the resync and consider upgrading if necessary.

  2. I/O Scheduler: Adjusting the I/O scheduler can still be relevant for software RAID. Try different schedulers to see if any provide better performance for your specific workload.

  3. Kernel Parameters: You've already adjusted some kernel parameters, but you can experiment with others, like increasing the md_resync_max_rate parameter to allow for a higher resync rate.

  4. Filesystem and Block Size: Ensure that your filesystem and block sizes are optimized for your workload. Larger block sizes can improve performance for large files, while smaller ones may benefit small files.

  5. Monitor Memory Usage: Excessive memory usage can lead to performance issues. Ensure that your system has sufficient RAM, and monitor memory usage during the resync.

  6. Check for Software Updates: Make sure your Linux distribution and RAID software are up-to-date. Updates can include performance improvements and bug fixes.

  7. Optimize Disk Alignment: Ensure that your SSD partitions are correctly aligned. Misalignment can impact performance, especially on SSDs.

  8. RAID Chunk Size: Similar to hardware RAID, software RAID allows you to set the chunk size. Experiment with different chunk sizes to see if it affects resync speed.

  9. Parallel Resync: Depending on your software RAID implementation (e.g., mdadm), you may be able to configure parallel resync to speed up the process.

  10. Backup and Restore: In some cases, it may be faster to create a new RAID array and restore data from backups rather than waiting for a slow resync, especially if your data is backed up regularly.

Remember to make changes incrementally, test their impact, and monitor system stability to ensure that any optimizations do not introduce instability or data corruption.

Ace
  • 812