4

I have a NAS as production system (mostly video footage and project files) and a 1:1 backup. I only need to retrieve the backup data in cases of emergency. Such a case didn't occur yet.

Lately I'm concerned that the data on the backup might can get corrupted without being noticed. So I thought about hashing all files on both systems to check if there are any unexpected differences.

I didn't found any suitable tool to fit into my overall process, but I'm able to write my own python script for this purpose. This will take me some weeks though, so I would like to ask if it makes sense anyways.

I'm asking, because the backup is running every day (drives are active and internal error correction can take effect, however the backup software don't hash) and there is already a layer of error correction in network transport (and maybe windows OS and NTFS file system).

Is it advice able to apply an addition security layer doing SHA512 hashes on all files, or can a already rely on the existing error corrections?

Thanks!

Martin
  • 41

3 Answers3

3

You need to be very careful with the user data checksumming! For example, enabling checksumming for ReFS turns it into fully log-structured file system with all the I/O patterns changed.

https://www.starwindsoftware.com/blog/log-structured-file-systems-microsoft-refs-v2-investigation-part-1

Your database app won't like it for sure as DBs use log in front of the "flat" storage to accelerate writes, with ReFS you'll end up with log-on-log concept and it's a terrible idea performance-wise.

https://www.usenix.org/system/files/conference/inflow14/inflow14-yang.pdf

I think you can safely rely on your backup of choice handling data integrity checks for you. How much of the downtime can you afford?

NISMO1968
  • 1,583
0

Depends on the data... For our database backups, we hash it before transferring and after testing at the first place it lands. We then push the new hash to top of a file to be read by other systems to compare their hashes to the validated hash and maek sure they got a copy of the file.

If your project files are important yes do it.If not probably not worth your time...

bash may be quicker...You can just do a for loop through the files doing openssl or if you don't care as much as bout perfection and more speed use md5sum.

so either md5sum $file >> log.out or sha11sum >> log.out. You can add dates a& stuff to it etc... and you can then wc -l the file lines and use diff to compare and make sure they're the same and then run the hashes on thee second copy and see if it's the same.

Hopefully that helps in some way... Sorry I should have been sleeping hours ago.

0

Decide whether you wish to implement protection at the volume level, file level, or both. By volume I mean file systems that can do data checksums and repair, like ReFS or ZFS. By file level, I mean checking files with backup software, or this ad-hoc checksum thing you are considering.

In general, take care that your backups are cold aka offline, on different media, and most importantly tested. This is a recovery system, not a backup system. Should some malware take control of your storage and encrypt the primary and backups on the one NAS, that is very bad. External storage media, which you leave unplugged but have tested by copying useful files off of, will save you even in that scenario.

John Mahowald
  • 36,071