21

I use rsnapshot to create hourly/daily/weekly/monthly backups of my "work"-share. Now I'm trying to copy the whole backup-directory onto an external drive using rsync.

I used this command/parameters within a screen session (yes, the rsync-exclude.txt lies in the dir I run the command from)

rsync -avzHP --exclude-from 'rsync-exclude.txt' /share/backup/ /share/eSATADisk1/backup/;

The whole thing is running on a QNAP TS-439, the internal drive is a single disk (no RAID) formated EXT4, the external drive is formated EXT3.

What happens is: Rsync follows every hardlink and copies the actual file instead of recreating the updated hardlink on the external drive. I didn't recognize this right away so the external drive ended up trashed with xxx copies of the same files.

What I want to achieve is: Copying the whole file structure generated by rsnapshot to the external drive keeping the hardlinks to save space. Note: This must not necessarily been done using rsync.

Thanks for your ideas and time. I'd appreciate your help, big time.

Update: I learned, that rsnapshot isn't using symlinks, it's using hardlinks so I now use the -H option which should preserve the hardlink structure acording to Rsnapshot to multiple destinations (or maintain hard links structure) but it still won't work... what am I missing here?

Update 2: I found another opinion/statement on this topic here: rsync with --hard-links freezes Steven Monday suggests not trying to rsync big file structures containing hardlinks, since it soaks up a lot memory an is a hard task for rsync. So probably a better solution would be making an .img of the data structure I'm trying to backup. What do you think?

woerndl
  • 313

5 Answers5

17

The rsync command's -H (or --hard-links) option will, in theory, do what you are trying to accomplish, which is, in brief: to create a copy of your filesystem that preserves the hard linked structure of the original. As I mentioned in my answer to another similar question, this option is doomed to fail once your source filesystem grows beyond a certain threshold of hard link complexity.

The precise location of that threshold may depend on your RAM and the total number of hard links (and probably a number of other things), but I have found that there's no point in trying to define it precisely. What really matters is that the threshold is all-too-easy to cross in real-world situations, and you won't know that you have crossed it, until the day comes that you try to run an rsync -aH or a cp -a that struggles and eventually fails.

What I recommend is this: Copy your heavily hard linked filesystem as one unit, not as files. That is, copy the entire filesystem partition as one big blob. There are a number of tools available to do this, but the most ubiquitous is dd.

With stock firmware, your QNAP NAS should have dd built in, as well as fdisk. With fdisk, create a partition on the destination drive that is at least as large as the source partition. Then, use dd to create an exact copy of your source partition on the newly created destination partition.

While the dd copy is in progress, you must ensure that nothing changes in the source filesystem, lest you end up with a corrupted copy on the destination. One way to do that is to umount the source before starting the copying process; another way is to mount the source in read-only mode.

Pav K.
  • 107
Steven Monday
  • 14,179
1

-l is for symlinks, why would it do anything for hardlinks?

(Sorry this is an answer and not a comment, I don't have comment rights yet and this answer needed a response)

Another note that should be a comment: is this all native hardware or are you on a VM, network mount?

Edit

ignore my earlier comment regarding why you are using hardlinks, I missed the rsnapshot comment.

It would be helpful to have a test that first tests rsync between two local directories local disk, then against your remote disk. This little test shows the -H option wokrs as expected. The -i option for ls shows the inodes, thus showing that the links have been preserved, with no extra copies.

$ rsync -avzHP src/ dest
sending incremental file list
created directory dest
./
file111_prime.txt
           9 100%    0.00kB/s    0:00:00 (xfer#1, to-check=0/3)
file111.txt => file111_prime.txt

sent 156 bytes  received 59 bytes  430.00 bytes/sec
total size is 18  speedup is 0.08

$ ls -liR
.:
total 8
414044 drwxrwxr-x. 2 nhed nhed 4096 Feb 25 09:58 dest
414031 drwxrwxr-x. 2 nhed nhed 4096 Feb 25 09:58 src

./dest:
total 8
414046 -rw-rw-r--. 2 nhed nhed 9 Feb 25 09:57 file111_prime.txt
414046 -rw-rw-r--. 2 nhed nhed 9 Feb 25 09:57 file111.txt

./src:
total 8
414032 -rw-rw-r--. 2 nhed nhed 9 Feb 25 09:57 file111_prime.txt
414032 -rw-rw-r--. 2 nhed nhed 9 Feb 25 09:57 file111.txt

A subsequent test rsync -avzHP src/ host:/tmp to a remote host still maintained the hardlinks

nhed
  • 629
1

This is a long shot, but if you can not find another solution i would suggest trying to format the USB drive as EXT4. Maybe this might be the problem: https://bugzilla.samba.org/show_bug.cgi?id=7670

Given enough hard links in a source folder and a small enough destination volume, copying with rsync --hard-links can fail. Rsync fails by exhausting the maximum number of hard links on the destination <...> the real issue isn't rsync but instead the underlying file system.

Motsel
  • 718
0

tl;dr: To address the issue, you can try to use the --no-inc-recursive option with rsync.

By default, rsync uses incremental recursion for directory synchronization. This means that it will perform a recursive sync without first checking if a hard link from the source is present in the destination. As a result, two scenarios may occur:

  1. If rsync finds an existing hard link on both the source and destination before encountering its new hard link that's only present on the source, the new hard link can be preserved correctly because the old one has already been identified.

  2. If the new hard link, which is unique to the source, is discovered first, and the corresponding original hard link hasn't yet been found, rsync will treat the new hard link as a broken link and copy the file to the destination instead of preserving the hard link.

For the second scenario, using the -vvvv option (to enable very verbose mode) allows you to monitor the process and confirm that the hard link preservation fails when the pre-existing hard link hasn't been detected. Conversely, successful hard link preservation indicates that the original hard link was found in the destination beforehand.

According to the rsync documentation:

If incremental recursion is active (see --recursive), rsync may transfer a missing hard-linked file before it finds that another link for that contents exists elsewhere in the hierarchy.

This does not affect the accuracy of the transfer (i.e. which files are hard-linked together), just its efficiency (i.e. copying the data for a new, early copy of a hard-linked file that could have been found later in the transfer in another member of the hard-linked set of files).

One way to avoid this inefficiency is to disable incremental recursion using the --no-inc-recursive option.

To prevent this inefficiency, you can disable incremental recursion by using the --no-inc-recursive option. This approach ensures that rsync will first scan the entire hierarchy, allowing it to properly identify and preserve all hard links during the synchronization process.

0

Have you tried adding the -l option?

I know the man page says that it's included in -a but man pages aren't always 100% accurate.

Ladadadada
  • 27,207