Syncthing on ZFS a good case for Deduplication?

Question

I've have a ext4 on LVM on linux RAID based NAS for a decade + that runs syncthing and syncs dozens of devices in my homelab. Works great. I'm finally building it's replacement based on ZFS RAID (first experience with ZFS), so lots of learning.

I know that:

Dedup is a good idea in very few cases (I'm aware of the RAM implications. Let's assume I wait until fast-dedup stabilizes and makes it into my system)
That most of my syncthing activity is little modifications to existing files
That random async writes are harder/slower on a zraid2. Syncthing would be everpresent but the load on the new NAS would be light otherwise.
Syncthing works by making new files then deleting the old one by default, but maybe copy_file_range changes this?

My questions are:

seeing how ZFS is COW, and syncthing would just constantly be flooding the array with small random writes to existing files, isn't it more efficient to make a dataset out of my syncthing data and enable dedup there only?
How does this syncthing setting interact with the ZFS dedup settings? copy_file_range Would it override the ZFS setting or do they both need to be enabled?

Update: Version info (Probably):

Ubuntu 22.04 or 24.04
kernel 6.5+
syncthing 1.27.5
OpenZFS 2.3

score 5 · Answer 1 · answered Apr 06 '24 at 21:46

Don't use ZFS dedup on anything other than fast NVMe based arrays with plenty of RAM and, even in this case, be prepared to pay a substantial performance cost. On a NAS (which I suppose runs on HDDs), do not use it. Really. This is not only a memory issue - even with infinite memory, you need to load the dedup table in the first place. Without a (very) fast storage, this will be a slow process.

Moreover, dedup take place in recordsize blocks - which are 128K by default, and lowering recordsize to increase dedup ratio (ie: 4K or 8K) brings other pressing issues (much lower compression ration and higher CPU/MEM overhead).

Finally, if syncthing removes the old/updated file by default, you will not gain much space by using dedup. Rather, be sure to use lz4 or zstd compression and to create a specific dataset for syncthing.

Regarding copy_file_range or reflink, please do not use it as the feature is not completely stable and can cause (rare) file corruption.

score 5 · Answer 2 · answered May 01 '24 at 14:49

ZFS deduplication isn't the sharpest tool in the shed... It's memory hog and it chews into read IOPS which aren't great for RAIDZx. If you absolutely need dedupe for whatever scenario you have (file server I guess?) you'd better go with virtualization and enable OS-level dedupe (VDO for Linux and whatever-Microsoft-calls-it-this-week for Windows Server). We run a couple of Proxmox+ZFS hosts doing exactly this and are quite happy with perf (KVM), reliability (ZFS), and space savings (VDO). Good luck!

score 3 · Answer 3 · answered May 31 '24 at 10:43

Q: Seeing how ZFS is COW, and syncthing would just constantly be flooding the array with small random writes to existing files, isn't it more efficient to make a dataset out of my syncthing data and enable dedup there only?

A: Not really! ZFS ZIL / SLOG come handy in this case. Please take a look:

https://www.servethehome.com/what-is-the-zfs-zil-slog-and-what-makes-a-good-one/

https://www.truenas.com/docs/references/zilandslog/

https://docs.oracle.com/cd/E53394_01/html/E54801/gffyt.html

Syncthing on ZFS a good case for Deduplication?

3 Answers3