2

When setting up a server, I was told it needs to have really good fault tolerance. What RAID array would give me the best fault tolerance?

2 Answers2

3

In classical hardware RAID world, RAID1 and RAID6 are the more reliable RAID levels.

In the more advanced software RAID world (MDRAID and ZFS), you can use 3-way mirroring or even triple parity scheme (ZFS only).

From a reliability standpoint, correctly configured ZFS pools probably are the state of the art.

shodanshok
  • 52,255
1

RAID-6 is, in my opinion, the best (at least, the best affordable), if fault tolerance is the most desired property, raid-5 being second best.

It would seem like RAID-1 (or 10 if more speed is desired and money isn't an issue) might be the go-to solution, but I wouldn't recommend that.

RAID levels 1 and 5 have in common that exactly one disk may fail, and no bad things will happen. Also, apart from complete disk failure, the array is resilient to single sectors becoming unreadable (as long as N-1 sectors with the corresponding number remain). With raid-1, in theory even as many as 50% of the array could fail and no bad things would happen, as long as it's strictly the "correct disks" that are failing. That is, never any two disks with the same index.

In principle, you could also do raid-1 with two, three, or ten mirror copies if you like, but monetary constraints usually forbid that. After all, throwing two dozen extra disks at the problem doesn't precisely make the approach "inexpensive" (though the word "inexpensive" in RAID refers to the single disks within the array, but nobody would want to use a cheap disk in a RAID anyway).

RAID-10 is somewhat inferior insofar as it is basically RAID-0 stacked on top of two (or maybe three) instances of RAID-1. Although each of them is fault-tolerant within its limits, if any single of these fail, the whole thing fails.

RAID-5 is cheap (only one extra disk needed) and has actually been sufficient for most people because hey, when do two disks die at the same time? Never happens! Well, sadly it can happen, and it does happen. Also, it can happen that a sector becomes unreadable. Yeah, that never happens, it's sooooooo unlikely, right.

Unluckily, when you need to re-sync after a failure, you must read every sector on all remaining disks. With modern disk capacities, that is a huge number. Huge number multiplied with unlikely-probably-never-happens will, unluckily result in a probability that is not at all impossible. It can happen that after one disk has failed, a sector goes bad. It can happen that a second disk (which has the same number of power-on hours) fails, especially when put under a 16-18-hour stress test during resync.

RAID-6 is the same as RAID-5 except it can withstand two disks failing simultaneously. It does not matter which disks fail, there is no worst case. Any two disks go down, and you're still good to go.
So when you have the first disk failing, it's not yet time for cold sweat. You are still good to go, and you still have redundancy in place. That is sooooooo much better compared to RAID-5, and it comes at the price of yet only one extra disk.

Damon
  • 231