122

On a server, install git

cd /
git init
git add .
git commit -a -m "Yes, this is server"

Then get /.git/ to point to a network drive (SAN, NFS, Samba whatever) or different disk. Use a cron job every hour/day etc. to update the changes. The .git directory would contain a versioned copy of all the server files (excluding the useless/complicated ones like /proc, /dev etc.)

For a non-important development server where I don't want the hassle/cost of setting it up on a proper backup system, and where backups would only be for convenience (I.E. we don't need to backup this server but it would save some time if things went wrong), could this be a valid backup solution or will it just fall over in a big pile of poop?

030
  • 6,085

17 Answers17

109

You're not a silly person. Using git as a backup mechanism can be attractive, and despite what other folks have said, git works just fine with binary files. Read this page from the Git Book for more information on this topic. Basically, since git is not using a delta storage mechanism, it doesn't really care what your files look like (but the utility of git diff is pretty low for binary files with a stock configuration).

The biggest issue with using git for backup is that it does not preserve most filesystem metadata. Specifically, git does not record:

  • file groups
  • file owners
  • file permissions (other than "is this executable")
  • extended attributes

You can solve this by writing tools to record this information explicitly into your repository, but it can be tricky to get this right.

A Google search for git backup metadata yields a number of results that appear to be worth reading (including some tools that already attempt to compensate for the issues I've raised here).

etckeeper was developed for backing up /etc and solves many of these problems.

larsks
  • 47,453
30

I've not used it, but you might look at bup which is a backup tool based on git.

Helge Klein
  • 2,111
stew
  • 9,588
12

Whilst technically you could do this I would put two caveats against it:

1, You are using a source version control system for binary data. You are therefore using it for something that it was not designed for.

2, I worry about your development process if you don't have a process (documentation or automated) for building a new machine. What if you got hit buy a bus, who would know what to do and what was important?

Disaster recovery is important, however its better to automate (script) the setup of a new development box than just backup everything. Sure use git for your script/documentation but not for every file on a computer.

11

It can be a valid backup solution, etckeeper is based on this idea. But keep an eye on the .git directory permissions otherwise pushing /etc/shadow can be readable in the .git directory.

chicks
  • 3,915
  • 10
  • 29
  • 37
Stone
  • 7,279
8

I use git as a backup for my Windows system, and it's been incredibly useful. At the bottom of the post, I show the scripts I use to configure on a Windows system. Using git as a backup for any system provides 2 big advantages:

  1. Unlike commercial solutions often use their own proprietary format, your backup is in an open source format that is widely supported and very well documented. This gives you full control of your data. It's very easy to see which files changed and when. If you want to truncate your history, you can do that as well. Want to obliterate something from your history? No problem. Getting a version of your file back is as simple as any git command.
  2. As many or as few mirrors as you want, and all can have customized backup times. You'll get your local mirror, which is unburdened by slow Internet traffic, and thus gives you (1) the ability to do more frequent backups throughout the day and (2) a quick restoration time. (Frequent backups are a huge plus, because I find the most time I lose a document is by user-error. For example, your kid accidentally overwrites a document he's been working on for the last 5 hours.) But you'll get your remote mirror, which gives the advantage of data protection in case of a local disaster or theft. And suppose you want your remote mirror backing up at customized time to save your Internet bandwidth? No problem.

Bottom line: A git backup gives you incredible amounts of power on controlling how your backups happen.

I configured this on my Windows system. The first step is to create the local git repo where you will commit all your local data to. I recommend using a local second hard drive, but using the same harddrive will work to (but it's expected you'll push this somewhere remote, or otherwise your screwed if the harddrive dies.)

You'll first need to install cygwin (with rsync), and also install git for Windows: http://git-scm.com/download/win

Next, create your local git repo (only run once):

init-repo.bat:

@echo off
REM SCRIPT PURPOSE: CREATE YOUR LOCAL GIT-REPO (RUN ONLY ONCE)

REM Set where the git repository will be stored
SET GBKUP_LOCAL_MIRROR_HOME=E:\backup\mirror


REM Create the backup git repo. 
SET GIT_PARAMS=--git-dir=%GBKUP_LOCAL_MIRROR_HOME%\.git --work-tree=%GBKUP_LOCAL_MIRROR_HOME% 
mkdir %GBKUP_LOCAL_MIRROR_HOME%
git %GIT_PARAMS% init
git %GIT_PARAMS% config core.autocrlf false
git %GIT_PARAMS% config core.ignorecase false 
git %GIT_PARAMS% config core.fileMode false
git %GIT_PARAMS% config user.email backup@yourComputerName
git %GIT_PARAMS% config user.name backup

REM add a remote to the git repo.  Make sure you have set myRemoteServer in ~/.ssh/config   
REM The path on the remote server will vary.  Our remote server is a Windows machine running cygwin+ssh.  
REM For better security, you could install gitolite on the remote server, and forbid any non-fast-forward merges, and thus stop a malicious user from overwriting your backups.
git %GIT_PARAMS% remote add origin myRemoteServer:/cygdrive/c/backup/yourComputerName.git

REM treat all files as binary; so you don't have to worry about autocrlf changing your line endings
SET ATTRIBUTES_FILE=%GBKUP_LOCAL_MIRROR_HOME%\.git\info\attributes
echo.>> %ATTRIBUTES_FILE% 
echo *.gbkuptest text>> %ATTRIBUTES_FILE% 
echo * binary>> %ATTRIBUTES_FILE% 
REM compression is often a waste of time with binary files
echo * -delta>> %ATTRIBUTES_FILE% 
REM You may need to get rid of windows new lines. We use cygwin's tool
C:\cygwin64\bin\dos2unix %ATTRIBUTES_FILE%

Next, we have our backup script wrapper, which will be called regularly by Windows Scheduler:

gbackup.vbs:

' A simple vbs wrapper to run your bat file in the background
Set oShell = CreateObject ("Wscript.Shell") 
Dim strArgs
strArgs = "cmd /c C:\opt\gbackup\gbackup.bat"
oShell.Run strArgs, 0, false

Next, we have the backup script itself that the wrapper calls:

gbackup.bat:

    @echo off

REM Set where the git repository will be stored
SET GBKUP_LOCAL_MIRROR_HOME=E:\backup\mirror
REM the user which runs the scheduler
SET GBKUP_RUN_AS_USER=yourWindowsUserName
REM exclude file
SET GBKUP_EXCLUDE_FILE=/cygdrive/c/opt/gbackup/exclude-from.txt

SET GBKUP_TMP_GIT_DIR_NAME=git-renamed
for /f "delims=" %%i in ('C:\cygwin64\bin\cygpath %GBKUP_LOCAL_MIRROR_HOME%') do set GBKUP_LOCAL_MIRROR_CYGWIN=%%i

REM rename any .git directories as they were (see below command)
for /r %GBKUP_LOCAL_MIRROR_HOME% %%i in (%GBKUP_TMP_GIT_DIR_NAME%) do ren "%%i" ".git" 2> nul

SET RSYNC_CMD_BASE=C:\cygwin64\bin\rsync -ahv --progress --delete --exclude-from %GBKUP_EXCLUDE_FILE%

REM rsync all needed directories to local mirror
%RSYNC_CMD_BASE% /cygdrive/c/dev %GBKUP_LOCAL_MIRROR_CYGWIN%
%RSYNC_CMD_BASE% /cygdrive/c/Users/asmith %GBKUP_LOCAL_MIRROR_CYGWIN%
%RSYNC_CMD_BASE% /cygdrive/c/Users/bsmith %GBKUP_LOCAL_MIRROR_CYGWIN%

cacls %GBKUP_LOCAL_MIRROR_HOME% /t /e /p  %GBKUP_RUN_AS_USER%:f

REM rename any .git directories as git will ignore the entire directory, except the main one
for /r %GBKUP_LOCAL_MIRROR_HOME% %%i in (.git) do ren "%%i" "%GBKUP_TMP_GIT_DIR_NAME%" 2> nul
ren %GBKUP_LOCAL_MIRROR_HOME%\%GBKUP_TMP_GIT_DIR_NAME% .git

REM finally commit to git
SET GIT_PARAMS=--git-dir=%GBKUP_LOCAL_MIRROR_HOME%\.git --work-tree=%GBKUP_LOCAL_MIRROR_HOME% 
SET BKUP_LOG_FILE=%TMP%\git-backup.log
SET TO_LOG=1^>^> %BKUP_LOG_FILE% 2^>^&1
echo ===========================BACKUP START=========================== %TO_LOG%
For /f "tokens=2-4 delims=/ " %%a in ('date /t') do (set mydate=%%c-%%a-%%b)
For /f "tokens=1-2 delims=/:" %%a in ('time /t') do (set mytime=%%a%%b)
echo %mydate%_%mytime% %TO_LOG%
echo updating git index, committing, and then pushing to remote %TO_LOG%
REM Caution: The --ignore-errors directive tells git to continue even if it can't access a file.
git %GIT_PARAMS% add -Av --ignore-errors %TO_LOG%
git %GIT_PARAMS% commit -m "backup" %TO_LOG%
git %GIT_PARAMS% push -vv --progress origin master %TO_LOG%
echo ===========================BACKUP END=========================== %TO_LOG%

We have exclude-from.txt file, where we put all the files to ignore:

exclude-from.txt:

target/
logs/
AppData/
Downloads/
trash/
temp/
.idea/
.m2/
.IntelliJIdea14/
OLD/
Searches/
Videos/
NTUSER.DAT*
ntuser.dat*

You'll need to go to any remote repos and do a 'git init --bare' on them. You can test the script by executing the backup script. Assuming everything works, go to Windows Scheduler and point an hourly backup toward the vbs file. After that, you'll have a git history of your computer for every hour. It's extremely convenient -- every accidentally delete a section of text and miss it? Just check your git repository.

user64141
  • 201
4

Well it's not a bad idea, but I think there is 2 red flags to be raised:

  • If the harddisk fail, you'll lose everything if you're not pushing your commit to another server/drive. ( Event if you've a plan for it, I prefer to mention. )

... but still, it can be a good backup for corruptions-related things. Or like you said, if the .git/ folder is somewhere else.

  • This backup will always increase in size. There's no pruning or rotation or anything by default.

... So you may need to tell your cronjob to add tags, and then make sure commit that are not tagged will be cleaned up.

FMaz008
  • 449
4

I once developped a backup solution based on subversion. While it worked quite well (and git should work even better), I think there are better solutions out here.

I consider rsnapshot to be one of the better - if not the better. With a good use of hard link, I have a 300 GB fileserver (with half a million files) with daily, weekly and montly backup going back as far as one years. Total used disk space is only one full copy + the incremental part of each backup, but thanks to hardlinks I have a complete "live" directory structure in each of the backups. In other word, files are directly accessible not only under daily.0 (the most recent backup), but even in daily.1 (yestarday) or weekly.2 (two week ago), and so on.

Resharing the backup folder with Samba, my users are able to pull the file from backups simply by pointing their PC to the backup server.

Another very good options is rdiff-backup, but as I like to have files always accessible simply by heading Explorer to \\servername, rsnapshot was a better solution for me.

shodanshok
  • 52,255
2

I haven't tried it with a full system but I'm using it for my MySQL backups (with the --skip-extended-insert option) and it has really worked well for me.

You're going to run into problem with binary data files (their entire contents could and will change) and you might have problems with the .git folder getting really large. I would recommend setting up a .gitignore file and only backing up text files that you really know you need.

2

I had the same idea to backup with git, basically because it allows versioned backups. Then I saw rdiff-backup, which provides that functionality (and much more). It has a really nice user interface (look at the CLI options). I'm quite happy with that. The --remove-older-than 2W is pretty cool. It allows you to just delete versions older than 2 weeks. rdiff-backup stores only diffs of files.

Daniel
  • 3,347
2

I am extremely new to git, but aren't branches local by default, and must be pushed explicitly to remote repositories? This was an unpleasant and unexpected surprise. After all, don't I want all of my local repo to be 'backed up' to the server? Reading the git book:

Your local branches aren’t automatically synchronized to the remotes you write to — you have to explicitly push the branches you want to share. That way, you can use private branches for work you don’t want to share, and push up only the topic branches you want to collaborate on.

To me this meant that those local branches, like other non-git files on my local machine, are at risk of being lost unless backed up regularly by some non-git means. I do this anyway, but it broke my assumptions about git 'backing up everything' in my repo. I'd love clarification on this!

1

Wrote about a simple way to do this: backup-org-files-in-github

This works for files that are not collaborated upon, in my case - emacs org files. I used cron to periodically do a git commit, git push.

ArMD
  • 111
1

You might want to check out bup on github which was designed to serve the purpose of using git for backup.

mcantsin
  • 130
1

It is a approach that is used, it makes sense.

Keepconf use rsync and git for this job, it's a wrapper over this tools for keep the thing easy.

You only need a central server with ssh-keys configured for access to the backup servers and a few lines in the configuration file. For example, this is my own file for keep all /etc/ and the debian packages installed:

[hosts]
192.168.1.10
192.168.1.11
192.168.1.12

[files]
/etc/*
/var/lib/dpkg/status

With that, I have the rsync backup and the git commit.

rfmoz
  • 811
1

It would work somewhat, but two caveats.

  1. File additions will not be picked up automatically when you do the commit. Use --porcelain or git status to find new stuff to add before doing the commit.

  2. Why the hassle of a remote mount for the .ssh? It could be fragile and you won't know that it failed. Use a bare repository for the far end with a normal ssh key login. As long as the repository is bare and you only push from one source it is guaranteed to work without a merge.

Andrew
  • 141
0

If it works for your use case, it can be a very powerful tool to debug any run time issues. So, you can go back in history and see when the problem in your system started and what was the last change just before the problem.

I used a very simple script since I did not want to spend too much time perfecting something not important for my use case. I just wanted to backup my cloud9 environment to backup any code changes while I am learning in my lab environment. If you are looking to preserve file permissions, this won't work. So, you will have to do some extra research. But pretty sure you can find something to fulfill that info as well.

This is the script I used.

#!/usr/bin/bash
cd /home/ec2-user/environment
git add .
git commit -m "$(date +%s)"
git push -u origin master
echo $?

Also you need to set up ssh keys and add the public key to your account on github. I generated a new set of keys as cloud9 ec2 instance did not have any keys under .ssh dir. Then I added cron entry to run every 15 mins. I have set cloud9 ec2 instance to shutdown in 30 mins if not used. So, if I have anything not saved it will automatically get committed and I won't have to keep a copy anywhere else.

Image of the github repo which is being committed to from the cloud9 ec2 instance

0

My personal opinion is that this is basically all backwards. You're pushing the files into a backup solution, rather than pulling them out.

Much better would be to centralise the configuration of the server in the first place, and then pull it down, using something like puppet.

That said, it may work, i just dont think it'd be that good.

Try looking into backuppc - its pretty easy to set up and is frankly brilliant.

Sirex
  • 5,585
0

I found this to be a good methodology for my dev boxes. It changes them from being something that needs to be backed up to only a deployment endpoint.

All the configuration and package installation manifests are stored in Puppet, allowing for easy redeployment and configuration updates. The Puppet directory is backed up with git. Kickstart is used to do the initial deploy.

I also keep a custom YUM repository for whatever packages are being developed at the time. This has the added benefit that whatever packages we are working with aren't just left as unattended binaries on the local system - if that happens and the files get nuked oh well. Someone didn't follow proper procedure.

Tim Brigham
  • 15,655