2

Platform: Ubuntu 10.04 x86.

We have a HTTP server (nginx, but that is not relevant) which serves some static content. Content is (rarely) uploaded by content-managers via SFTP, but may be changed / added my some other means (like a cat, made directly on server).

Now we want to add a second, identical HTTP server — a slave mirror in another data-center on another continent. (And setup DNS round-robin.)

What is the best way to set up synchronization between master server and a slave mirror, so that delay between modification and re-syncronization is minimal (a few seconds should be bearable though)?

The solution must cope with large changesets and race conditions. That is, if I change 1000 files, it should not spawn 1000 syncronization processes. And if I change something while synchronization is active, my new change must eventually make it to the server as well... And so on.

Rejected solutions:

  • CDN — does not worth the money for our particular usage scenario.
  • nfs — not over global internet.
  • dumb cron + rsync — latency and/or system load would be too large.
  • manual rsync — not reliable, content is changed by non-IT users.

I would say that we need something based on inotify. Is there a ready solution?

Update: two extra (rather obvious) requirements that I forgot to mention:

  • If data is somehow changed on the slave mirror (say, a superuser accidentally deleted a file), synch solution must restore data back to the master state on the next sync.

  • In idle state the solution must not consume traffic or system resources (other than some memory etc. for the sleeping daemon process of course).

Update 2: one more requirement:

  • The solution must work with UTF-8 file names.

4 Answers4

1

Have you considered Unison as a means to keeping files in sync? Using it, you'd be able to do the one-way sync you're requesting. It seems like a reasonable fit for this application.

ewwhite
  • 201,205
1

You could use lsyncd see: Is there a working Linux backup solution that uses inotify?

1

What about pirsyncd? I think it`s good idea for you ;)

-2

Seems like this is where you might want to write a script that checks on timestamps of files and if the timestamp is later than last run of script, assume that file needs to be pushed, then trigger rsync or some other tool to synchronize the file. Likewise, on the other side, do the same thing with checking if a file has been changed, and if so, trigger a pull. Fabric might actually be a good tool for this. If you are familiar with Python, using fabric may be the way to go, in combination with timestamp checking.

slashdot
  • 651