So rsync runs some checksums in the course of deciding what to transfer (i.e. what blocks within a file). But is there any reason to trust the file you end up with on the receive side any more than you would for a normal network transfer? Should I run checksums after rsync finishes to verify the data? Is rerunning rsync with the pre-check (i.e. --checksum option) turned on an accepted way to accomplish this?
3 Answers
In general the rsync checksum mechanism is fairly reliable. The tradeoff here is the usual one: you can do more verification but it will take more time. If you are really worried that a set of files most be exactly the same on two machines, you should run a separate verification. For example, you can use md5sum on the file list on both sides and compare the results. Assuming that the files don't change in the meantime (like log files) that will give you a very high confidence that the files are identical on both sides.
- 15,265
Use rsync -Pahn --checksum /path/to/source /path/to/destination | sed '/\/$/d' | tee migration.txt
sed removes directories from the checksum verification.
tee outputs to the screen and to the file at the same time.
Keep in mind that this might not be a suitable method if you have very large files, as the verification will take a long time.
- 1,975
- 5
- 36
- 63
Just for your case, there is a special tool for monitoring the integrity of files after synchronization. It works independently of rsync and is completely open source of course.
https://github.com/precizer/precizer
precizer is a lightweight and blazing-fast command-line application written entirely in pure C. It is designed for file integrity verification and comparison, making it particularly useful for checking synchronization results. The program recursively traverses directories, generating a database of files and their checksums for quick and efficient comparisons.
Built for both embedded platforms and large-scale clustered mainframes, precizer helps detect synchronization errors by comparing files and their checksums across different sources. It can also be used to analyze historical changes by comparing databases generated at different points in time from the same source.
Basic Example
Consider a scenario where two machines have large mounted volumes at /mnt1 and /mnt2, respectively, containing identical data. The goal is to verify, byte by byte, whether the contents are truly identical or if discrepancies exist.
Run precizer on the first machine (e.g., hostname
host1):precizer --progress /mnt1
This command recursively traverses all directories under /mnt1, creating a database file host1.db in the current directory. The --progress flag provides real-time progress updates, displaying the total traversed space and the number of processed files.
Run precizer on the second machine (e.g., hostname
host2):precizer --progress /mnt2
This will generate a database file host2.db in the current directory.
Copy
host1.dbandhost2.dbto one of the machines and run the following command to compare them:precizer --compare host1.db host2.db
The output will display:
- Files that exist on
host1but are missing onhost2, and vice versa. - Files present on both hosts but with different checksums.
- 113