What are potential dangers of spanning ZFS RAIDZ over different size disks?

Question

I am redesigning my homelab servers from scratch, and want to try ZFS after experimentation I've done.

I know there is a limitation that disks in vdevs must all have the same size, otherwise only the smallest one would be used.

That's a problem for me. I have an assortment of various sized HDDs, and no budget for upgrades. My goal is to use what I have without compromising ZFS.

I know ZFS is an enterprise file system, and that in the enterprise world it's cheaper to buy identical disks rather do what I am going to do.

I have come up with the following workaround idea, which I want to validate.

Initial setup:

- 4 disks of 16 TB each (sd[abcd])
- 3 disks of 8 TB each (sd[efg])
- 2 disks of 6 TB each (sd[hi])

I create a partition on every disk as large as the smallest non-zero free space, while accounting for partition alignment, and repeat this until no more disks with free space left.

Final picture:

- 4 disks of 16 TB each (sd[abcd]):
    part1: 6 TB
    part2: 2 TB
    part3: 8 TB
- 3 disks of 8 TB each (sd[efg]):
    part1: 6 TB
    part2: 2 TB
- 2 disks of 6 TB each (sd[hi]):
    part1: 6 TB
 +-------------------------+------------+----------------------------------+

sda: | sda1: 6 TB              | sda2: 2 TB | sda3: 8 TB                       |
     +-------------------------+------------+----------------------------------+
sdb: | sdb1: 6 TB              | sdb2: 2 TB | sdb3: 8 TB                       |
     +-------------------------+------------+----------------------------------+
sdc: | sdc1: 6 TB              | sdc2: 2 TB | sdc3: 8 TB                       |
     +-------------------------+------------+----------------------------------+
sdd: | sdd1: 6 TB              | sdd2: 2 TB | sdd3: 8 TB                       |
     +-------------------------+------------+----------------------------------+
sde: | sde1: 6 TB              | sde2: 2 TB |
     +-------------------------+------------+
sdf: | sdf1: 6 TB              | sdf2: 2 TB |
     +-------------------------+------------+
sdg: | sdg1: 6 TB              | sdg2: 2 TB |
     +-------------------------+------------+
sdh: | sdh1: 6 TB              |
     +-------------------------+
sdi: | sdi1: 6 TB              |
     +-------------------------+

I created equally sized partitions on different physical disks that I can use as building blocks for multiple RAIDZ:

zpool create tank
  #       |----- 16 TB -----|  |--- 8 TB ---|  |- 6TB -|
  raidz2  sda1 sdb1 sdc1 sdd1  sde1 sdf1 sdg1  sdh1 sdi1  # part1: 6 TB
  raidz2  sda2 sdb2 sdc2 sdd2  sde2 sdf2 sdg2  # .......... part2: 2 TB
  raidz2  sda3 sdb3 sdc3 sdd3  # .......................... part3: 8 TB

I think the following should be true:

every RAIDZ vdev uses only partitions that are on different physical drives, so that if one disk fails, at most one device in RAIDZ will fail
when one drive fails, it will cause all three RAIDZ to degrade, but replacing the disk and repartitioning it the same way will let ZFS transparently recover
since ZFS prefers writes to the vdev with the most free space, and first RAIDZ will have the most, then until we run out of first 6 TB I suppose other RAIDZ wouldn't see much use, so there shouldn't be IOPS bottlenecks. I hope so.

The final touch is to set the IO scheduler to "noop", though I am not sure if ZFS would be intelligent enough to properly schedule across and realize sda1 and sda2 are on the same spinning rust device.

In theory, I don't see why this setup wouldn't work, but I may be missing something. Are there any downsides or dangers in running this configuration?

shodanshok · Answer 1 · 2024-08-14T07:16:35.967

Short answer: you can and it will work fine, but you have a better option: create a zpool with two RAIDZ vdevs, one with 4x16T disks and one with 3x8T disks (leaving aside the 2x6T disks).

Long answer: when using mismatched disks, ZFS will automatically limit itself to the size of the smallest one - 6T in your proposed solution. You can partition the disks or leaving ZFS alone, anyway only 9x6T will be used by the RAIDZ vdev. Moreover, your vdev will be too wide for safe operation in RAIDZ1 mode. The only upside of this configuration is that you can use any other 6T disks to replace a failed one. Example (done via loop devices):

# note the -f (force), it instruct ZFS to ignore size/layout mismatch and safety checks (use with care!)
[root@localhost test]# zpool create zzz raidz loop0 loop1 loop2 loop3 loop4 loop5 loop6 loop7 loop8 -f
[root@localhost test]# zpool status zzz
  pool: zzz
 state: ONLINE
config:
    NAME        STATE     READ WRITE CKSUM
    zzz         ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        loop0   ONLINE       0     0     0
        loop1   ONLINE       0     0     0
        loop2   ONLINE       0     0     0
        loop3   ONLINE       0     0     0
        loop4   ONLINE       0     0     0
        loop5   ONLINE       0     0     0
        loop6   ONLINE       0     0     0
        loop7   ONLINE       0     0     0
        loop8   ONLINE       0     0     0


errors: No known data errors
total raw space: 9x6T = 54T
root@localhost test]# zpool list zzz
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zzz   54.0T   186K  54.0T        -         -     0%     0%  1.00x    ONLINE  -

However, another (better) approach exists: use two RAIDZ vdevs, one with 4x16T disks and one with 3x8T disks. You have no size mismatch, more space, better performance and better reliability. Example:

# you don't need -f anymore
[root@localhost test]# zpool create zzz raidz loop0 loop1 loop2 loop3 raidz loop4 loop5 loop6
[root@localhost test]# zpool status zzz
  pool: zzz
 state: ONLINE
config:
    NAME        STATE     READ WRITE CKSUM
    zzz         ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        loop0   ONLINE       0     0     0
        loop1   ONLINE       0     0     0
        loop2   ONLINE       0     0     0
        loop3   ONLINE       0     0     0
      raidz1-1  ONLINE       0     0     0
        loop4   ONLINE       0     0     0
        loop5   ONLINE       0     0     0
        loop6   ONLINE       0     0     0


errors: No known data errors
much more space
[root@localhost test]# zpool list zzz
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zzz   88.0T   207K  88.0T        -         -     0%     0%  1.00x    ONLINE  -

You can also experiment with two RAIDZ2 vdevs with 4x16T + 4x6T (really 3x8T+1x6T), depending on your data durability needs.

As always, take the above only as suggestions - you had to check they work correctly in your specific case.

What are potential dangers of spanning ZFS RAIDZ over different size disks?

1 Answers1

total raw space: 9x6T = 54T

much more space