What are tools that lend themselves for backup and recovery in a CI/CD environment?

Question

I have used tar, dump, Amanda, Legato (EMC) Networker and other backup software for making backups of systems. However, I don't know the best tools for making backups in a CI/CD (Continuous Integration/Continuous Deployment) environment and doing "rolling recoveries" of systems "as you go" which is the case in a DevOps oriented system.

Many of these backup utilities are not necessarily best suited for a CI/CD environment because of the continuous changes taking place on both the development and production environment.

I'm looking at programs like Borg and Git based backups like Bup These programs would allow for many incremental backups and finer granularity of backups and work well with CD tools like Ansible. Even Ansible has a couple of way to make backups. However, there is no clean way to recover the most recent backup in an automated way.

Relying on some form of caching or mirroring of your content on the cluster and hope it's not corrupted is not a way to go :(

The things that would be backed up would be the config files of the servers, the database and uploaded content as these will almost always be unique and necessary. The containers, VMs, even the webapps would NOT be backed up as these should all be uniform and constantly updated under the CI/CD principles. Currently I use a combination of scp and tar into an archive directory sorted by date and machine. I like to find something better if possible.

I would like to know what are the best backup and recovery tools used in the CI/CD environment? and an example of how you would configure it?

I don't expect a "magic bullet" just some possible solutions.

score 3 · Accepted Answer · answered Dec 31 '18 at 03:40

The things that would be backed up would be the config files of the servers, the database and uploaded content as these will almost always be unique and necessary. The containers, VMs, even the webapps would NOT be backed up as these should all be uniform and constantly updated under the CI/CD principles. Currently I use a combination of scp and tar into an archive directory sorted by date and machine. I like to find something better if possible.

Config files, containers and VMs should be backed up and versioned by a tool like git or SVN if you are doing configuration management properly.. Backing up config files and versioning them is the whole point of configuration management tools like, salt, puppet, ansible, and chef.

As you duly note however, dynamic content such as database data and similarly dynamic data is a more challenging issue. These however, are outside of the scope of configuration management, CI, and CD principles. These principles purposefully remain silent on dealing with these issues: because they have no new answers for these problems. Instead, you should use the many battle-tested techniques and strategies for dealing with this kind of data instead of expecting CI/CD/CM tools to solve problems that they were never intended to address.

AnoE · Answer 2 · 2018-12-24T17:41:50.527

The things that would be backed up would be the config files of the servers, the database and uploaded content as these will almost always be unique and necessary.

But would they? Config files can go into git. Databases (for the "CI" part) should be 100% creatable from source as well (migrations, fixtures etc.), and may be so on each single test run anyways. I'm not sure what you mean with "uploaded content" in the context of a CI pipeline.

Your goal would be to reduce everything to some kind of middle layer that is acting as a barrier between the hardcore low-level stuff, and the application (including CI/CD). This could be straight Docker - i.e., packaging up everything (including your CI/CD driver, Jenkins or whatever) in Dockerfiles, with everything needed included as configuration - everything in git.

Or a little more advanced, a Kubernetes/OpenShift cluster which does the same; or the respective solution of the cloud providers (AWS, Azure etc.).

Below that border, you are running with classical solutions as you are used to; there should be very little "special" you need to take care of, and it should be easy (as nothing from the application should influence this at all) to make it so you can add nodes without backup/recovery at all. I.e., if some piece of hardware or non-containerized software breaks, you simply throw it away and replace it with a new one.

Dynamic data that needs to persist (e.g., the history of job runs stored by your actual CI driver) will be stored on dedicated volumes (which contain said data, and only it - not interspersed with OS or other files), which, in turn, you can backup with classical tools (offline backups etc.) - the biggest issue, there, is to decide whether to run the backup in "container land", or on the underlying abstraction layer; but aside from that, it would be just copying files around as usual (with whatever tool you like most), not special just because it's a "CI environment".

TL;DR: make as much as possible run in containers, with dedicated volumes; don't backup containers; backup volumes as usual.

score -1 · Answer 3 · answered Dec 24 '18 at 09:21

I think from a DevOps POV that you are approaching this from slightly the wrong angle. DevOps thinking for example is that there is no need to backup a configuration file; the configuration file is written by a configuration management tool such as Chef and can be recreated at any time.

The database does need to be backed up obviously as that is the truly stateful component of the whole system; you don't mention which you are using but head on over to DBA.SE and someone (perhaps me) can give you some specific pointers on that side of things.

For the uploaded content, if this is outside the database, I'd recommend some filesystem-snapshot based approach, it's easy to do and low overhead.

What are tools that lend themselves for backup and recovery in a CI/CD environment?

3 Answers3