1

I had a SuperMicro SC825 go down recently. I wasn't present for it's failure but have been able to pull the last RAID status from a remote log. It's configured with a software RAID5 array with 6 drives.

> cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10]
md1 : active raid5 sda5[0] sdc5[2] sdd5[3] sdf5[5] sde5[4]
      197516800 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [U_UUUU]

md0 : active raid5 sda1[0] sdc1[2] sdd1[3] sdf1[5] sde1[4] 4686266880 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [U_UUUU]

So, at least one drive failed, and the machine continued to run with a failed drive, however sometime after that log entry the machine now doesn't start. The bootloader fails with the following output

 mdadm: /dev/md/0 assembled from 5 drives (out of 6), but not started.
 mdadm: failed to start array /dev/md/0: Input/output error

This is using Ubuntu LTS 24.04

Here's my questions:

  1. [?] Why isn't RAID5 running with the 1 failed drive?
  2. [?] If I replace the failed drive will the RAID5 array still repair?
  3. [yes] As long as I match the rpm, sata, and size (with the same form factor to plug into the server) am I fine to use a different model replacement drive?

Monitor display of mdadm output during failed boot

bias
  • 237

1 Answers1

2

There are also errors on fd0? That's the floppy drive. I'm not sure if that's because you don't have a floppy drive? Could it be a symptom that your mainboard/bus is broken?

As for replacing the disk: you can mismatch if you want (I do it all the time) with a bigger drive. The most important thing is that you don't put a 4k sector drive in when it's not 4k aligned. But, mdraid is normally properly aligned.

You can also try to boot a system rescue and assemble the array there. Under normal circumstances, it will properly detect the array.

Then hopefully, you have a degraded but functional array, and you can just mdadm --add a drive.

To conclude: don't use RAID5 anymore. Read error rate vs modern disk sizes are such that with so many disks, you're almost guaranteed to run into a read error trying to rebuild the array.

Halfgaar
  • 8,534