Checksumming to verify rsync transfers

Question

So rsync runs some checksums in the course of deciding what to transfer (i.e. what blocks within a file). But is there any reason to trust the file you end up with on the receive side any more than you would for a normal network transfer? Should I run checksums after rsync finishes to verify the data? Is rerunning rsync with the pre-check (i.e. --checksum option) turned on an accepted way to accomplish this?

score 3 · Accepted Answer · answered Nov 05 '11 at 16:18

In general the rsync checksum mechanism is fairly reliable. The tradeoff here is the usual one: you can do more verification but it will take more time. If you are really worried that a set of files most be exactly the same on two machines, you should run a separate verification. For example, you can use md5sum on the file list on both sides and compare the results. Assuming that the files don't change in the meantime (like log files) that will give you a very high confidence that the files are identical on both sides.

score 2 · Answer 2 · answered Jul 07 '18 at 21:29

Use rsync -Pahn --checksum /path/to/source /path/to/destination | sed '/\/$/d' | tee migration.txt

sed removes directories from the checksum verification. tee outputs to the screen and to the file at the same time.

Keep in mind that this might not be a suitable method if you have very large files, as the verification will take a long time.

Source

Dennis V · Answer 3 · 2025-02-23T02:21:08.353

Just for your case, there is a special tool for monitoring the integrity of files after synchronization. It works independently of rsync and is completely open source of course.

https://github.com/precizer/precizer

precizer is a lightweight and blazing-fast command-line application written entirely in pure C. It is designed for file integrity verification and comparison, making it particularly useful for checking synchronization results. The program recursively traverses directories, generating a database of files and their checksums for quick and efficient comparisons.

Built for both embedded platforms and large-scale clustered mainframes, precizer helps detect synchronization errors by comparing files and their checksums across different sources. It can also be used to analyze historical changes by comparing databases generated at different points in time from the same source.

Basic Example

Consider a scenario where two machines have large mounted volumes at /mnt1 and /mnt2, respectively, containing identical data. The goal is to verify, byte by byte, whether the contents are truly identical or if discrepancies exist.

Run precizer on the first machine (e.g., hostname host1):

precizer --progress /mnt1

This command recursively traverses all directories under /mnt1, creating a database file host1.db in the current directory. The --progress flag provides real-time progress updates, displaying the total traversed space and the number of processed files.

Run precizer on the second machine (e.g., hostname host2):

precizer --progress /mnt2

This will generate a database file host2.db in the current directory.

Copy host1.db and host2.db to one of the machines and run the following command to compare them:

precizer --compare host1.db host2.db

The output will display:

Files that exist on host1 but are missing on host2, and vice versa.
Files present on both hosts but with different checksums.

Checksumming to verify rsync transfers

3 Answers3

Basic Example

The output will display: