9

I am pretty new to Ceph and try to find out if Ceph supports hardware level raid HBAs.

Sadly could not find any information. What I found is, that it is recommended to use plain disks for OSD. But this pushes the requirements to the PCIe, the interfaces of the disk to high bandwidths and the CPU requirements are very high.

Hardware RAID controllers have solved these requirements already and they provide high redundancy based on the setups without eating my PCIe, CPU or any other resources.

So my wished setup would be to have local RAID controller(s), which handle my in disk redundancy at controller level (Raid 5, raid 6) whatever RAID level I need. On top of what RAID LUNs I would like to use Ceph to do the higher level of replication between: host, chassis, rack, row, datacenter or whatever is possible or plannable in CRUSH

  1. Any experiences in that setup?
  2. Is it a recommended setup?
  3. Any in depth documentation for this hardware RAID integration?
cilap
  • 297

3 Answers3

8

You can doesn't mean you should. Mapping RAID LUNs to Ceph is possible, but you inject one extra layer of abstraction and kind of render at least part of Ceph functionality useless.

Similar thread on their mailing list:

https://web.archive.org/web/20180502082753/http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/021159.html

R Schultz
  • 103
  • 3
0

But this pushes the requirements to the PCIe, the interfaces of the disk to high bandwidths and the CPU requirements are very high.

Not really, many storage workloads are served well with modern general purpose CPUs and interconnects.

Yes, a RAID controller takes care of redundancy with a handful of disks in one chassis. But that's cost and complexity when you run already redundant, multi node distributed storage solutions like Ceph. Why bother mirroring a physical disk when Ceph already has multiple copies of it?

The building blocks of such a solution are just a bunch of disks. Such as Open Compute Storage's Open Vault. 30 spindles in an enclosure, attached to a compute node of maybe a couple dozen CPU cores. Add as many nodes as you need to scale out. You can leave that compute dedicated to Ceph if you want to maximize throughput.

John Mahowald
  • 36,071
-1

The recommended setup is to use single disks or, eventually, disks in RAID-1 pairs.

A single SAS controller (or a RAID controller in JBOD mode) can drive several hundred disks without any trouble.

Using very large arrays defeats the very purpose of CEPH which is to avoid single points of failures and "hot points". It will also actually harm your redundancy.

Let's say you want to build a 1 PB CEPH cluster using 8 TB drives, using 36 disks servers chassis (ordinary Supermicro-like hardware). Let's compare the setups with and without RAID in terms of storage capacity and reliability:

  • With RAID-6 you need 5 chassis (and 10 OSDs).

    • Each chassis will have 2 18 disks RAID arrays.
    • You'll have 1024 TB of available storage.
    • In case of a multiple disk crash you'll have to rebuild 256 TB.
  • With CEPH and 5 chassis you'll have 180 OSDs.

    • Available capacity will be slightly superior (using erasure coding): 1152 TB
    • in case of a multiple disk crash you'll have to rebuild only the number of failed disks (unless it's an entire server, it will always be less than 256 TB).
wazoox
  • 7,156