4

I have ~700GB storage where I store ~15M files, hence average file size is ~50KB. To back it up, over night I run a simple rsync script with the following set of flags:

--archive --update --compress --numeric-ids --human-readable --stats

It takes 8+ hours for rsync to complete its job, in average there're ~1-4GB of data moved daily. It seems incredibly inefficient to me.

Can I tune my rsync script any how? I suppose my best bet is data migration to MongoDB or something similar, but there is a problem with that, because current infrastructure relies on the files being accessed as on the posix file system, migrating them to something totally different may require extra work, potentially too much work... What other best strategy might be?

NarūnasK
  • 408

1 Answers1

0

It takes that long for rsync just to analyze that many files, even though the transfer is efficient. It has to do in excess of 15M IOs, plus or minus caching. You could throw very fast storage at it, but that can be costly.

The zfs suggestion is to use block level copies in which this becomes one giant file to transfer.

The concepts also apply to lvm, although it might require more scripting as remote snapshots aren't built in. See something like lvmsync for ideas.

John Mahowald
  • 36,071