278

For a distributed team that uses Git and GitHub as version control, should images also be stored in the git repository?

The images in question are small/medium-sized web-friendly images. For the most part, the images won't be changed. The folder containing them will only grow in size as images are added. A concern is that the image folder may grow to a large size over time by combination of large images or a lot of images.

Is this considered a best practice? What other alternatives are there to sharing binary files needed in projects that a distributed team can easily access?

Thomas Owens
  • 85,641
  • 18
  • 207
  • 307
spong
  • 9,471
  • 6
  • 45
  • 58

10 Answers10

237

Are your images original work or can they be recovered (guaranteed?) from elsewhere? Are they needed to ship a software unit built from source? If they are original, they need backing up. Put them in your revision control, if they never change, the space penalty is the same as a backup, and they are where you need them.

Can they be edited to change the appearance of the software, accidentally or intentionally? Yes - then they MUST be revision controlled somehow, why use another way when you have a perfect solution already. Why introduce "copy and rename" version control from the dark ages?

I have seen an entire project's original artwork go "poof" when the graphics designer's MacBook hard drive died, all because someone, with infinite wisdom, decided that "binaries don't belong in rev control", and graphics designers (at least this one) don't tend to be good with backups.

Same applies to any and all binary files that fit the above criteria.

The only reason not to is disk space. I am afraid at $100/terabyte, that excuse is wearing a bit thin.

mattnz
  • 21,490
86

This question is pretty old but this is a common question that comes up when dealing with Git and there has some progress on modern solutions to storing large files in a Git repo since the last answer.

For storing large files in Git there are the following projects:

  • git-lfs - While I haven't used this extensively it appears to be the holy grail. It's backed by Github and is available on all their repos as of October 2015 and puts the complexity of file management on site storing your repos. Only downside is that this is fairly new, so beyond Github there isn't much support, though Gitlab also has support, as does Gitea, and Bitbucket has alluded to support in the future.
  • git-annex - This has been around for awhile but frankly it's complexity gets in the way.
  • git-media - No personal experience with this one. Seems fairly complex as well.
  • git-fit - An attempt to create a simpler plugin. Requires S3 storage. While I appreciate the simplicity my main concern with plugin is that it's fairly unknown and maintained by 1 individual (full disclosure, I am the only other committer at this time and it was for a trivial issue).
  • git-fat - Another, low-dependency approach.

TLDR: if you can, use git-lfs to store images or other binary files in git.

81

Why the hell not? :)

Storing binaries is considered bad practice, yes, but I never worried too much about images.

Worst case, if you have tons, store them somewhere else or use externals or an extension for binary support. And if the images won't be changed that often, then where's the problem? You won't get a big fat delta. And if they get removed over time, it's only your server that suffers a bit from storing the history, but clients won't see a thing.

In my opinion, you shouldn't worry about it - granted you don't store GBs of those.

What you could do though, is only store "source" images: SVGs, LaTeX macros, etc... and have the final images generated by your build system. That's probably even better, if you can. If not, then don't bother.

(All that being said, Git shines for text files, but is not the best VCS for pictures. Give us more context and metrics if you can)


For additional information, you may want to look at these Q&As:

haylem
  • 29,005
60

The whole "don't store binaries in source control" is set forth for a specific reason: If you have source code that compiles, don't store the actual compilation, but just the source code. Images and visual assets do not have a "source," so they should be tracked in version control.

Jason
  • 701
28

I believe the recommended way with Git is to use a sub-module (introduced in Git 1.5.3) which is basically a separate repository that is associated with the main one. You store your images (and other binary assets) in the sub-module. This can then be checked-out with the main repository or left, depending on what is required.

From http://book.git-scm.com/5_submodules.html

"Git's submodule support allows a repository to contain, as a subdirectory, a checkout of an external project. Submodules maintain their own identity; the submodule support just stores the submodule repository location and commit ID, so other developers who clone the containing project ("superproject") can easily clone all the submodules at the same revision. Partial checkouts of the superproject are possible: you can tell Git to clone none, some or all of the submodules."

Also, size shouldn't be a significant issue if the images don't change often. You can also run commands to prune/reduce size, such as:

git gc
git gc-aggressive
git prune
Dan Diplo
  • 3,920
9

Should you store your images in a SCM? Yes. Without any doubt.

Should you store your images in git specifically? This gets more tricky.

git is very good with text files, but by its very nature isn't too hot with binaries. You will have issues with the size of the data transferred when you clone or push, your .git directories will grow, and you could get in a right mess with merging (ie how do you merge 2 images!)

One answer is to use submodules, as this means the link between your project and the images will be weaker - so you won't have to manage the images as if they were part of your source, yet still keeping them controlled, and not having worries with branching them - assuming the subproject is just a 'flat' repository of data that doesn't go through the same churn during the usual development process.

The other answer is to put them in a different project, never branch it, and ensure that everyone who commits to that project pushes it upstream immediately - never let 2 people change the same version of the file - you'll find this the most difficult aspect as git isn't designed for such a non-distributed workflow. You'll have to use old-fashioned communication methods to enfore this rule.

A third answer is to put them in a different SCM entirely that is better geared to working with images.

gbjbaanb
  • 48,749
  • 7
  • 106
  • 173
8

If it is part of the Project, it has to be in the VCS. How to achieve this best may depend on the VCS, or how you organize a Project. Maybe a repo for the designers, and only the results in the coder's repo, or only the 'Image sources' (i once had a project with a only a .svg file, and the images where generated via make/inscape cli).

But, if a VCS cannot handle that, or becomes unusuable, i would say, that it not the right tool for your job.

So far, i had no problems with putting 'usual' amounts of graphics (mockups, concepts, and page graphics) for web projects in git.

keppla
  • 5,210
  • 26
  • 32
7

Yes.

Lets say you release software version 1.0. For version 2.0 you decide to redo all the pictures to be with shadows. So you do this, and release 2.0. Then some customer who is using 1.0 and cannot upgrade to 2.0 decides they want the program in another language. They give you $1G to do it, so you say sure. But in a different culture, some of your pictures do not make sense, so you have to change them...

If you would keep your images in source control, this is easy, based on 1.0 you make changes to images (among other things), build, release. If you did not have these in source control, you would have a much harder time, since you would have to find the old images, change them, and then build.

0

Adding to @haylem's answer, note that size plays a large factor in this. Depending on the VCS it might not work well with tons of images. When clones or large pushes starting taking all night then its really too late as all the images are already in your repository.

Plan for large pictures and future growth. You don't want to get two years into this project and have a "oh crap, maybe the repo is a little too big."

TheLQ
  • 13,650
  • 7
  • 56
  • 88
0

I definitely agree that technically and economically storing them is feasible. Question I would as is "are these images part of the shipping product or part of the content of a shipping product?" Not that you can't store content in GIT (or any other VCS) but that it is a separate problem for a separate VCS.

Wyatt Barnett
  • 20,787