Do I still need a backup if I have a redudant storage system with rollback capabilities?

Question

My organization recently bought a storage system. It has 1.5Petabyte, with RAID6, and there is an online synced mirror in a physical different location.

The system allows rollback / file recovery, by default allowing up to 30 days but this can be increased.

There is a discussion going on if we need some kind of extra backup for data living only on the storage.

The system has a very good level of redundancy, it has geographical redundancy and allows up to some extent rollback which means we can recover up to the defined time (30 days by default) old data or accidentally deleted data.

Given this scenario does it still make sense to have a "traditional" backup? By traditional, I mean a dedicated backup system, with snapshots that we can retrieve in case something goes wrong.

Do we really need it? Am I missing something? Am I just thinking by the traditional way and being over zealous?

score 40 · Accepted Answer · answered Oct 06 '15 at 09:12

What you describe is essential a geographically distributed RAID and a RAID was never a backup.

Online sync usually means everything you do on the primary storage gets immediately replicated to the backup system, including operations like the deletion of (all) snapshots and/or volumes by an attacker or simply an admin error.

score 7 · Answer 2 · edited Oct 07 '15 at 05:59

The 30-day rollback is a great capability, but what if "critically-important-file-xyz" became corrupt/damaged and this was not detected until 31+ days later? This situation is the difference between back-up and archival schedules, but in your description the latter is not mentioned. Archival systems are usually stored on very low cost tape. Also no information is available on whether the business is one that has regulatory or other requirements to retain data for longer than 30 days, which is frequently the case.

If this is not the case for your situation, then you should be good.

score 6 · Answer 3 · answered Oct 06 '15 at 23:21

Having geographically separated machines both having the data is good.

What happens when you have multiple failures involving both or all your sites? A fire at one, theft of the servers at the other? Or there is a problem with the line between them, then the primary location's server goes out, and the HD controller goes ape and writes junk? Or some insider performs malicious acts on both? Or the FBI confiscates your servers at both locations because of suspected ( you would never, but, maybe you are co-hosted in a datacenter with schmucks ). Or.. I am reminded of several high profile "cloud" outages where everything was redundant, analyzed to the nth degree, but, still, things can go wrong. I'll grant you these are all unlikely, but you've acknowledged that unlikely things can happen.

So, it comes down to how important/valuable is that data? What will the organization do if it ends up gone?

Nick · Answer 4 · 2015-10-08T20:50:59.443

The question here seems to be about just how disconnected and geographically distinct a replicated copy of your data needs to be before it's a backup and not high availability/redundancy infrastructure. My gut is that you're close, but still need a backup.

To bring together (cherry-pick) some thoughts in the other answers and comments, you can go really far down the path of "well, X technology doesn't cover Y disaster scenario, so it's not a backup," and at some point you need to decide what's reasonable for you, which seems to be why you're asking. My feeling on this, and I think the feeling of many of the commenters, is that your backup needs to exist on a separate technological infrastructure from your in-use data so that failures, accidents, and malicious actions either can't propagate or have a much higher hurdle to cross. An example given in the comments is someone deleting the volumes, which is a valid, not pie-in-the-sky scenario in my opinion. But additionally, a real-world example from my work. The university I work for (but thankfully don't manage this infrastructure for) has some serious high-availability virtualization infrastructure that supports a lot of the campus facilities. It's at multiple sites, but is all running on one vendor's platform. An obscure bug cropped up one day that caused a failure cascade that first took down a single server, then when the load shifted, it took out the rest of that site, and then when the load shifted again, it took out the other sites hosting that infrastructure. (I believe they've resolved this issue since then). The data wasn't lost in this case, but it's feasible to imagine a scenario involving your data where it was.

You want your backup to be immune to all of that, and even accessible while that infrastructure is down. If the data is unavailable for a week while your RAID rebuilds, being able to recover business critical documents from backup is nice (though not required). If your RAID disappears, then replicates to your other site, you'll really want that backup to be from a separate vendor or on some isolated media like tape.

All this said, I'll again repeat that your backup should be on a separate infrastructure from your data. There are many levels of isolation here, but I think anything connected through direct replication is too close to be a backup. You'll want something in addition.

valentin · Answer 5 · 2015-10-07T11:22:11.560

Assumption: the storage system will be used by many applications.

I consider you will do much better with a separate backup system.

RAID and mirroring are not backup but builtin rollback feature can replace a traditional backup system.

BUT:

I prefer the recovery policies to be application/data based and not storage based because:

applications have different requirements related to recovery and acceptable loss of data (some of them imposed by various regulations: read-only mediums, encryption, keep last X years, etc),
some applications have (very) good backup and recovery tools (oracle, mssql) builtin and are recommend way to do the backup/recovery part (as an Oracle DBA, I prefer and I will do all my backups related to Oracle with rman).
growth, your usage of space can growth much quicker then you expect, now this system can accommodate 30 days of rollback data, this is not guaranteed in future
cheaper, the cost of using bigger tapes to accommodate backup/recovery policies, after several years of growth, will be smaller then the cost of buying new, bigger disks in order to respect the same rollback window as now

Do I still need a backup if I have a redudant storage system with rollback capabilities?

5 Answers5