3

Start of the problem

I have a dedicated server on hosting provider and recently my node exporter detected high disk io saturation on my RAID 1 array /dev/md3. I have checked smartctl for my hard drives and both drives in my array were showing high number of read errors:

[root@ovh-ds03 ~]# smartctl /dev/sda -a | grep Err
Error logging capability:        (0x01) Error logging supported.
     SCT Error Recovery Control supported.
  1 Raw_Read_Error_Rate     0x000b   099   099   016    Pre-fail  Always       -       65538
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

[root@ovh-ds03 ~]# smartctl /dev/sdb -a | grep Err Error logging capability: (0x01) Error logging supported. SCT Error Recovery Control supported. 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 65536 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

I asked throw support ticket to replace 2 disks, but instead of replace 2 more disks were added and array was rebuilt on those 2 new disks. Everything was fine, but now array in degraded state and I had alert because of it named ️NodeRAIDDegraded, checking on server yep it's in degraded state:

[root@ovh-ds03 ~]# mdadm --detail /dev/md3
/dev/md3:
           Version : 1.2
     Creation Time : Sat Mar 30 18:18:26 2024
        Raid Level : raid1
        Array Size : 1951283200 (1860.89 GiB 1998.11 GB)
     Used Dev Size : 1951283200 (1860.89 GiB 1998.11 GB)
      Raid Devices : 4
     Total Devices : 2
       Persistence : Superblock is persistent
 Intent Bitmap : Internal

   Update Time : Sat Sep 14 19:30:44 2024
         State : active, degraded
Active Devices : 2

Working Devices : 2 Failed Devices : 0 Spare Devices : 0

Consistency Policy : bitmap

          Name : md3
          UUID : 939ad077:07c22e9e:ae62fbf9:4df58cf9
        Events : 55337

Number   Major   Minor   RaidDevice State
   -       0        0        0      removed
   -       0        0        1      removed
   2       8       35        2      active sync   /dev/sdc3
   3       8       51        3      active sync   /dev/sdd3

How do I fix it?

I have tried to test various solution on rebuilding array from scratch and so on

mdadm --assemble --scan

1 Answers1

3

In order to fix it and remove removed disks you need to lower number of disks in RAID array, in your case you scaled from 2 to 4, now you need to scale back from 4 to 2, it can be done like this

mdadm --grow --raid-devices=2 /dev/md3

After that fix RAID array looks like that:

[root@ovh-ds03 ~]# mdadm --detail /dev/md3
/dev/md3:
           Version : 1.2
     Creation Time : Sat Mar 30 18:18:26 2024
        Raid Level : raid1
        Array Size : 1951283200 (1860.89 GiB 1998.11 GB)
     Used Dev Size : 1951283200 (1860.89 GiB 1998.11 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent
 Intent Bitmap : Internal

   Update Time : Sat Sep 14 19:33:15 2024
         State : active
Active Devices : 2

Working Devices : 2 Failed Devices : 0 Spare Devices : 0

Consistency Policy : bitmap

          Name : md3
          UUID : 939ad077:07c22e9e:ae62fbf9:4df58cf9
        Events : 55484

Number   Major   Minor   RaidDevice State
   2       8       35        0      active sync   /dev/sdc3
   3       8       51        1      active sync   /dev/sdd3