1

my raid6 array disappeared on reboot after I grew it. I believe the issue was growing 2x with the full disk not partition. It has been suggested a possible other reason the drives were not correctly recognized is I didn't zero the superblocks before readding to a new array. It could be a combination of both? Here are the commands issued (pulled from history, formatted to have consistent drive letters):

mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 /dev/sd[b-c]1

#Full backup of ROC raid 10 onto these drives, after having copied most files to other drives, check to make sure worked through reboot.

mdadm --create /dev/md1 --level=6 --raid-devices=4 /dev/sd[d-g]1

#Time passed to sync drives, and then rsync data from md0, reboots fine.

mdadm -S /dev/md0
mdadm /dev/md0 -r /dev/sd[b-c]

#NOTICE THE MISSING PARTITION NUMBER BELOW.

mdadm /dev/md1 --add /dev/sdb
mdadm /dev/md1 --add /dev/sdc
mdadm -list
mdadm --detail /dev/md1
mdadm --grow --raid-devices=6 --backup-file=/media/FastRaid/md1_grow.bak /dev/md1

After a reboot, the raid6 disapeered and is replaced by 2 raid0 arrays, one active (sdb/sdc) and one inactive (sdd-sdg). The the following is what I get from examining the superblocks:

/dev/sdb1:
        Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
    Array UUID : 501c08da:5069a3d8:b2982a5d:ab56c37c
        Name : tim-server:0  (local to host tim-server)
Creation Time : Tue Dec 13 22:01:10 2022
    Raid Level : raid0
Raid Devices : 2

Avail Dev Size : 7813770895 (3725.90 GiB 4000.65 GB) Data Offset : 264192 sectors Super Offset : 8 sectors Unused Space : before=264112 sectors, after=0 sectors State : clean Device UUID : e8db27d6:0dbd1ac5:4456c304:0b43f09c

Update Time : Tue Dec 13 22:01:10 2022

Bad Block Log : 512 entries available at offset 8 sectors Checksum : dfd187c0 - correct Events : 0

Chunk Size : 512K

Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdc1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 501c08da:5069a3d8:b2982a5d:ab56c37c Name : tim-server:0 (local to host tim-server) Creation Time : Tue Dec 13 22:01:10 2022 Raid Level : raid0 Raid Devices : 2

Avail Dev Size : 7813770895 (3725.90 GiB 4000.65 GB) Data Offset : 264192 sectors Super Offset : 8 sectors Unused Space : before=264112 sectors, after=0 sectors State : clean Device UUID : 3ce84b05:607f8565:456e7f83:88b83052

Update Time : Tue Dec 13 22:01:10 2022

Bad Block Log : 512 entries available at offset 8 sectors Checksum : e35ce3e5 - correct Events : 0

Chunk Size : 512K

Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdd1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 929a14c9:adaf502a:53658e03:90a19fce Name : tim-server:0 (local to host tim-server) Creation Time : Wed Dec 14 11:18:57 2022 Raid Level : raid6 Raid Devices : 6

Avail Dev Size : 7813770895 (3725.90 GiB 4000.65 GB) Array Size : 15627540480 (14903.58 GiB 16002.60 GB) Used Dev Size : 7813770240 (3725.90 GiB 4000.65 GB) Data Offset : 264192 sectors Super Offset : 8 sectors Unused Space : before=264112 sectors, after=655 sectors State : clean Device UUID : eaf10189:940aeaf8:947efe82:5d0e4aea

Internal Bitmap : 8 sectors from superblock Update Time : Sun Dec 18 06:31:11 2022 Bad Block Log : 512 entries available at offset 24 sectors Checksum : e38a1bd9 - correct Events : 26630

    Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 1 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sde1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 929a14c9:adaf502a:53658e03:90a19fce Name : tim-server:0 (local to host tim-server) Creation Time : Wed Dec 14 11:18:57 2022 Raid Level : raid6 Raid Devices : 6

Avail Dev Size : 7813770895 (3725.90 GiB 4000.65 GB) Array Size : 15627540480 (14903.58 GiB 16002.60 GB) Used Dev Size : 7813770240 (3725.90 GiB 4000.65 GB) Data Offset : 264192 sectors Super Offset : 8 sectors Unused Space : before=264112 sectors, after=655 sectors State : clean Device UUID : 5c34a9c7:bcc3f190:d1719a9c:8aa2b722

Internal Bitmap : 8 sectors from superblock Update Time : Sun Dec 18 06:31:11 2022 Bad Block Log : 512 entries available at offset 24 sectors Checksum : c429edf - correct Events : 26630

    Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 3 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdf1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 929a14c9:adaf502a:53658e03:90a19fce Name : tim-server:0 (local to host tim-server) Creation Time : Wed Dec 14 11:18:57 2022 Raid Level : raid6 Raid Devices : 6

Avail Dev Size : 7813770895 (3725.90 GiB 4000.65 GB) Array Size : 15627540480 (14903.58 GiB 16002.60 GB) Used Dev Size : 7813770240 (3725.90 GiB 4000.65 GB) Data Offset : 264192 sectors Super Offset : 8 sectors Unused Space : before=264112 sectors, after=655 sectors State : clean Device UUID : 12d1e3a8:b8749f59:654bcca4:4f4750df

Internal Bitmap : 8 sectors from superblock Update Time : Sun Dec 18 06:31:11 2022 Bad Block Log : 512 entries available at offset 24 sectors Checksum : 7af56ae7 - correct Events : 26630

    Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 0 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdg1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 929a14c9:adaf502a:53658e03:90a19fce Name : tim-server:0 (local to host tim-server) Creation Time : Wed Dec 14 11:18:57 2022 Raid Level : raid6 Raid Devices : 6

Avail Dev Size : 7813770895 (3725.90 GiB 4000.65 GB) Array Size : 15627540480 (14903.58 GiB 16002.60 GB) Used Dev Size : 7813770240 (3725.90 GiB 4000.65 GB) Data Offset : 264192 sectors Super Offset : 8 sectors Unused Space : before=264112 sectors, after=655 sectors State : clean Device UUID : 72085967:835efe92:cb268a64:4d192b52

Internal Bitmap : 8 sectors from superblock Update Time : Sun Dec 18 06:31:11 2022 Bad Block Log : 512 entries available at offset 24 sectors Checksum : a5623977 - correct Events : 26630

    Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 2 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)

I had inactivated md0 at some point, so I recreated with mdadm -A -o /dev/md0 /dev/sdb1 /dev/sdc1. This is a /proc/mdstat now:

cat /proc/mdstat

Personalities : [raid0] [linear] [multipath] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active (read-only) raid0 sdb1[0] sdc1[1] 7813770240 blocks super 1.2 512k chunks

md1 : inactive sdf10 sde13 sdd11 sdg12 15627541790 blocks super 1.2

unused devices: <none>

If I try to mount /dev/md0 /media/tmp_md_raid I get: mount: /media/tmp_md_raid: wrong fs type, bad option, bad superblock on /dev/md126, missing codepage or helper program, or other error.. If I try: mdadm -A -o /dev/md1 /dev/sdf1 /dev/sde1 /dev/sdd1 /dev/sdg1 I get:

mdadm: /dev/sdf1 is busy - skipping
mdadm: /dev/sde1 is busy - skipping
mdadm: /dev/sdd1 is busy - skipping
mdadm: /dev/sdg1 is busy - skipping

All of the smartctl's say all the drives are fine. I am not sure if I should try an mdadm --assemble --force first or an mdadm --create --assume-clean first. Should I try the 2nd with -o set to see if I can recreate the array and view the data without possibly destroying the recovery? Thanks for any advice.

1 Answers1

0

It seems you have 6-device array (AAAAAA), but only 4 component devices are available (/dev/sd[defg]1). The capacity calculation confirms this: one needs 6 disks of 4TB to create 16TB RAID6 array.

Since this is RAID6 and all 4 available devices seem to be in sync, it can be run, but only in so called fully degraded mode. In this mode to read any block it needs to read a stripe from all drives (which is I/O intensive) and perform a reconstruction (it uses both parity syndroms which involves CPU-intensive Galois field calculations), and to write a block it needs to do read the whole stripe, calculate new parity syndroms and write to at least three devices (which is even more I/O-intensive overall).

Linux has no other way to go but fall back to this if the array was running and some device failed in the middle of the use, that's the whole point of having a RAID array. As you may have guessed, the performance in this state is very bad and the risk of losing data is very high, which is why you shouldn't run an array in this state for long stretches of time. Ideally you supply a hot spare in addition to working devices so it will be able to start reconstruction immediately it detects the failure of any component.

But during the boot it doesn't know are some devices permanently missing or are they just not available yet due to staggered spin-up or other initialization delays. Activating array early will kick late devices out of sync and force a time-lengthy resync, during which the array will experience the worst performance characteristic as described above. This motivates to wait for late appearing devices. Linux will not activate partially-available array by default automatically, even if there are enough devices to run it at least in some degraded mode.

But you, an administrator, can force it to do so. For this, reassemble the array with --force:

mdadm --stop /dev/md1
mdadm --force --assemble /dev/md1 /dev/sd[defg]1

More precisely, it won't automatically assemble the array if there are less devices available than are recorded in superblocks of present devices (in your case it is recorded that all devices were available last time); when you remove a device properly with mdadm -f/ mdadm -r sequence, or when force assembling it, it records that and the array will then be auto-assembled in same degraded state automatically.

If this array doesn't contain valuable data, better re-create it. Initialization feels faster than adding devices and suffering reconstruction.