I am redesigning my homelab servers from scratch, and want to try ZFS after experimentation I've done.
I know there is a limitation that disks in vdevs must all have the same size, otherwise only the smallest one would be used.
That's a problem for me. I have an assortment of various sized HDDs, and no budget for upgrades. My goal is to use what I have without compromising ZFS.
I know ZFS is an enterprise file system, and that in the enterprise world it's cheaper to buy identical disks rather do what I am going to do.
I have come up with the following workaround idea, which I want to validate.
Initial setup:
- 4 disks of 16 TB each (sd[abcd])
- 3 disks of 8 TB each (sd[efg])
- 2 disks of 6 TB each (sd[hi])
I create a partition on every disk as large as the smallest non-zero free space, while accounting for partition alignment, and repeat this until no more disks with free space left.
Final picture:
- 4 disks of 16 TB each (sd[abcd]):
part1: 6 TB
part2: 2 TB
part3: 8 TB
- 3 disks of 8 TB each (sd[efg]):
part1: 6 TB
part2: 2 TB
- 2 disks of 6 TB each (sd[hi]):
part1: 6 TB
+-------------------------+------------+----------------------------------+
sda: | sda1: 6 TB | sda2: 2 TB | sda3: 8 TB |
+-------------------------+------------+----------------------------------+
sdb: | sdb1: 6 TB | sdb2: 2 TB | sdb3: 8 TB |
+-------------------------+------------+----------------------------------+
sdc: | sdc1: 6 TB | sdc2: 2 TB | sdc3: 8 TB |
+-------------------------+------------+----------------------------------+
sdd: | sdd1: 6 TB | sdd2: 2 TB | sdd3: 8 TB |
+-------------------------+------------+----------------------------------+
sde: | sde1: 6 TB | sde2: 2 TB |
+-------------------------+------------+
sdf: | sdf1: 6 TB | sdf2: 2 TB |
+-------------------------+------------+
sdg: | sdg1: 6 TB | sdg2: 2 TB |
+-------------------------+------------+
sdh: | sdh1: 6 TB |
+-------------------------+
sdi: | sdi1: 6 TB |
+-------------------------+
I created equally sized partitions on different physical disks that I can use as building blocks for multiple RAIDZ:
zpool create tank
# |----- 16 TB -----| |--- 8 TB ---| |- 6TB -|
raidz2 sda1 sdb1 sdc1 sdd1 sde1 sdf1 sdg1 sdh1 sdi1 # part1: 6 TB
raidz2 sda2 sdb2 sdc2 sdd2 sde2 sdf2 sdg2 # .......... part2: 2 TB
raidz2 sda3 sdb3 sdc3 sdd3 # .......................... part3: 8 TB
I think the following should be true:
- every RAIDZ vdev uses only partitions that are on different physical drives, so that if one disk fails, at most one device in RAIDZ will fail
- when one drive fails, it will cause all three RAIDZ to degrade, but replacing the disk and repartitioning it the same way will let ZFS transparently recover
- since ZFS prefers writes to the vdev with the most free space, and first RAIDZ will have the most, then until we run out of first 6 TB I suppose other RAIDZ wouldn't see much use, so there shouldn't be IOPS bottlenecks. I hope so.
The final touch is to set the IO scheduler to "noop", though I am not sure if ZFS would be intelligent enough to properly schedule across and realize sda1 and sda2 are on the same spinning rust device.
In theory, I don't see why this setup wouldn't work, but I may be missing something. Are there any downsides or dangers in running this configuration?