Asynchronous file replication over long distances, with buffering

Question

I'm looking for a filesystem that can replicate over long distances, and tolerate being offline for extended periods of time by using a local buffer, which should be a disk buffer, to queue up changes to replicate.

DRBD was an ideal candidate with DRBD Proxy, but it buffers in RAM. I'm not sure that will be adequate.

I'm trying to avoid things like Ceph which have much more functionality than needed.

It should handle on the order of a billion files on a single filesystem, and need only replicate from filesystem A to filesystem B. There will be a lot of files, but they will only be written and not changed. A moderate amount of data will be written all the time, but not so much that it won't be perfectly feasible for replication to catch up after even a few days of being offline. No clustering or anything fancy like that is required.

Really, what I'm looking for is something that works like MySQL replication, but for a file system.

I found a lot of commentary on replicating file systems, but for me the missing piece is being able to buffer updates to disk if the link is down for an extended period.

score 4 · Answer 1 · answered Jul 02 '19 at 18:30

4

There is a fully asynchronous kernel-level replication, based on trransaction logfiles: https://github.com/schoebel/mars

answered Jul 02 '19 at 18:30

Thomas Schöbel-Theuer

41

score 0 · Answer 2 · answered Apr 12 '17 at 23:27

Perhaps using zfs send/receive do the trick?

I have been using zfs on linux for years now to achieve something like this.

I can imagine a sort of loop that creates a snapshot, then sends it over the wire if it fails will try again with ever growing times in between retries.

You could even separate the snapshotting process from the replication process to keep smaller increments to improve resilience for networking failures while sending the updates.

Asynchronous file replication over long distances, with buffering

2 Answers2