12

I have an array of disk with RAID 6 and 16 Drives. Days ago three disk failed and the Array was marked as Degraded. I cannot access the data and I cannot boot into the Operative System. I need access to the data but I cannot do anything. Any advice? How can I recover or access the data? Could I use a Live cd to boot an OS? I'm using SAS Disk. Thanks in advance

8 Answers8

42

As said before, if more than two disks in a RAID-6 array die, the array is unrecoverable.

However, three simultaneous disk failures are quite an unlikely event: it might very well be a case of a faulty enclosure, backplane and/or controller.

You should try removing and re-inserting the disks, replacing the controller and/or the enclosure, and even putting the disks in a different server with the same controller (if you have one available).

Massimo
  • 72,827
19

You don't give any details on the server type, RAID controller type or anything specific.

Try turning everything off for 10 minutes... Remove power from the server. Let the drives spin down.

Power the server back on and see if the RAID controller re-recognizes the drives and is able to boot.

ewwhite
  • 201,205
14

As stated in the comment, RAID6 can sustain up to two disk failures; if a third disk fails, your array is toast.

The most obvious thing is to restore from backup. If this is not possible and at least one of the failed disk is still readable (albeit with read errors), you can try to do a block-level copy of each failed disk on another, healty disk (eg: via ddrescue <failed_disk> <new_disk>) and to re-start the array using these copies (plus the other good disks).

You will end with a punctured arrays where some original data can be lost/corrupted; however, with any luck, the greatest part of data should be accessible.

If you have no backup and none of the failed disk is readable, you need to contact a data rescue service.

shodanshok
  • 52,255
7
  1. You probably don't have a software RAID, no matter what the tag says. You cannot boot OS from a software RAID6.

  2. 3 disks out of 16 failing together are quite rare occurence, except when you drop the server on the floor. It is either 3 disks failing one by one over a large timespan and no one noticing or a failed controller, failed cable, failed power supply, failed backplane or a firmware bug kicking in. It is important to determine which case you have, because the recovery strategy is different. There may be BIOS or RAID controller logs accessible.

  3. In either case, you start by backing up every single disk on another media, using a different, known to work controller. In the process, you will see how many of the disks are actually broken and how much.

  4. Most (probably all) hardware RAID controllers are crap. I learned the hard way. A "disk failed" condition may actually be a single bad sector and most (or even all) data could be recoverable.

  5. A "degraded" array is an array that still has all the data accessible. What you describe is a "failed" or "offline" array, rather than "degraded". If you are not experienced in these matters, call someone who IS.

  6. Starting from a recovery/live CD may or may not be a part of the process. If you don't know how to mount a filesystem in read-only mode, call someone who knows. It is possible to kill a perfectly recoverable data by such a mistake.


After a lot of sleepless nights I design my servers in such a way that everything stops working when the FIRST disk fails. THIS is the only error message that no one ignores.

fraxinus
  • 624
5

Recover from backup. You won’t see your data on this RAID LUN again.

RiGiD5
  • 1,859
2

RAID 6 can only survive two failed hard drives. If you do not have any backups and need the data, I would recommend hiring a hard drive recovery company. I would not try and recover the data on your own because the more you work the hard drives, the higher the chances are the data will not be recoverable.

Joe
  • 1,190
0

as a last resort option (after trying everything others have already posted as answer here), you could attempt to force one drive as online/not-degraded.

I just had the case that 3 of 6 very old drives in a hardware raid 6 failed. I was lucky and able to recover some of the data:

  1. removed 2 failed drives
  2. in the options of my hardware raid controller I forced the third failed drive as online (not degraded)
  3. put in 2 new drives
  4. rebuilt the array
  5. and now removed the last failed drive

I was lucky and lost no relevant data, but of course there is the risk of data corruption/loss with this approach, but the data on the raid is lost otherwise anyway, so might be worth a shot if the raid controller gives that option.

0

I know this thread is old but it shows up first on google and wanted to post what happened to me.

I had one drive in a raid 6 with 6 drives flag its smart error, and I did buy the new drive in advance but I got busy and couldn't replace it in time. The reason why I had a kerfuffle is because when the one bad drive died, it freaked out the raid controller in such a way that it said 3 drives died. I knew this couldn't be the case because there were no smart errors for the others.

What was annoying about my case was, the bad drive freaked out the raid controller so hard that it would no longer read smart data from any of the "bad" (not really bad, read above) drives. So I couldn't figure out which drive was bad as I didn't save a screenshot from earlier saying which one it was. Yes, I know, I am an idiot, and you're probably facepalming right now but I'm just here to post my experience.

What I had to do was disconnect everything and only hook up the drives via their power connectors and go one at a time until I found the bad drive. I knew it was bad because I could hear it clicking and spinning (in the bad way). In fact, mdadm said that the bad drive was one of the GOOD ones. Also what is annoying is it doesn't sound bad until you have it on for 20 minutes so this process took like 2 hours. When all was said and done though, I got my data back.

After that I had to do a mdadm --scan --reassemble --force /dev/mdX /dev/sda /dev/sdb ... etc. Please note that the force did NOT work until I removed the bad drive and that pretty much NOTHING worked or showed as available or existing until I removed the bad drive.

Anyway the reason why I'm posting this is because for a few hours I had come to the conclusion that mdadm was completely correct and that I was screwed. Yes the raid controller is probably cheap but again, just posting this to help anyone else out.