9

I am setting up a JBOD containing 44 4TB 7200 RPM SAS HDs. I chose RAID 60 as I prefer drive failure protection over the performance improvements offered by RAID 10. My issue is how to choose the optimal disks per span that would results in a reasonable rebuild time. For example, assuming I leave 4 hot-spares, this results in 40 disks for the following possible RAID setups:

  • 2 spans with 20 disks, ~144 TB usable capacity.
  • 4 spans with 10 disks, ~128 TB usable capacity.
  • 5 spans with 8 disks, ~120 TB usable capacity.
  • 8 spans with 5 disks, ~96 TB usable capacity.

I am leaning towards 4 spans of 10 disks as it seems to offer best balance of fault-tolerance (2 of 10 drive failures per span tolerated) and usable capacity (80%, down from 90% for 2 spans of 20 disks).

However, what can I expect rebuild time to be for a single 10 disk span? Web search reveals that even a 10 disk span might not be feasible as rebuild may take too long, thus risks additional drive failure during rebuild. However, many resources on the internet are based on fewer disks or lower capacity disks.

Any thoughts as to what is the optimal setup for this relatively large number of disks?

NOTE: There is backup policy for about 10 TB of data, but not feasible to backup all data. Hence my leaning towards RAID 60 over RAID10. I realize this is not a substitute for backup, but better recovery from drive failure does make system more robust by providing opportunity to rebuild then migrate data to other storage should multiple disk failures occur.

EDIT: Specifications:

  • Disks: Seagate 4TB SAS 3.5" HDD 7200 RPM, enterprise grade.
  • Controller: ServerRAID M5016 controller, including RAID6 enabled, LSI2208 chipset. See: https://www.broadcom.com/products/storage/raid-on-chip/sas-2208.
  • Enclosure: Supermicro 4U storage JBOD 45x3.5 with 2x1400W redundant power modules.
  • OS: CentOS Linux release 7.1.1503 (Core).

Thank you for the help.

Vince
  • 411
  • 6
  • 19

5 Answers5

3

With 4 TB 7.2k drives, I'd recommend making the subarrays as small as possible - actually, 5 drives don't really justify using RAID 6 at all.

My 2c are to use RAID 10 where you can expect a rebuild to finish within 12 hours which a 5-drive 20-TB RAID 6 array most probably won't.

Make sure you enable monthly data scrubbing/media patrol/whatever-it's-called-here to detect read errors before they have a chance to stop a rebuild. Most often when a rebuild fails, the cause is not a completely failing drive but a rather old, yet undetected read error that could have been repaired with a regular scrubbing.

Zac67
  • 13,684
2

Based on excellent comments received I have attempted a RAID60 consisting of 5 spans of 8 disks each for the following reasons:

  1. Based on recent rebuild that included 2 spans of 20 disks, I estimate the rebuild time for the 8+2 configuration to be reasonable.

  2. Usable capacity is reduced marginally compared to spans with larger number of disks (eg. 10 or 20 disks per span). While loss of 20TB seems considerable, smaller span size means rebuild will be achievable is acceptable trade-off.

I will update this answer with any additional information I gather.

Edit: Removed RAID5 as viable option.

Vince
  • 411
  • 6
  • 19
1

With modern hardware RAID controllers from Avago (LSI) or Microsemi (Adaptec), 20+2 disks RAID arrays are perfectly fine. The rebuild time is reasonable (less than 24 hours). Current drives have very low failure rates, anyway. I'd definitely use 2 spans.

wazoox
  • 7,156
1

On such a big array, I would really use RAID10, or the equivalent ZFS mirrored setup. You could setup a 42-disk RAID10 + 2 global hot spares (for ~82 TB usable space), and it will provide excellent protection against disk failures with very fast rebuild time.

If you really, really want to use RAID6, I lean toward 5x 10-disks spans.

shodanshok
  • 52,255
0

If your bus(es)' bandwidths are high enough, the rebuild time for a 20 disk RAID6 array shouldn't differ much from the rebuild time for an 8 or 10 disk RAID6 array. Basically, there is one continuous read from each of the non-failed disks and - at the same time - one continuous write to the disk, that is being rebuilt.

If you have limited bandwidths and rebuild times matter, than make sure, that you shuffle the disks of each of the sub-RAID6s among all available controllers. Let's say, that you have 4 controllers with 11 disks, each (and probably a 12th slot on each controller, where some of those take a fast system SSD), then a setup with five 8-disk RAID6s and four hot spares seems optimal: Each controller than carries two disks out of each 8-disk RAID6 and one hot spare. Ideally, whatever script or admin action you use to rebuild data in case of disk failure, it should prefer to use the hot spare on the same controller as the failed disk, but resort to any other available hot spare, if the hot spare on a given controller is already in use.

Kai Petzke
  • 468
  • 1
  • 4
  • 11