4

I have an asm/C code which implements some image filters. The objective of this project is experimentation with different implementations, benchmarking, plotting and reporting data to write a paper with relevant insights.

So for example one work-flow I need would be:

  1. Start at a base code implementation (Step 0)
  2. Compile, benchmark, plot
  3. Change parameter X from 110 to 100 in main.c (Step 1)
  4. Compile, benchmark, plot
  5. Change parameter Y from 10 to 15 in main.c (Step 2)
  6. Compile, benchmark, plot

But I need to be able to access every step of the way with a git tag/branch/commit/something. So that I might re-run the test on a different PC or make changes to each step. For example I might decide that changing X from 110 to 100 was not enough, so I change step 1 to be change X from 110 to 80 for step 1.

I thought about using 1 branch for each experiment with a tag for each step, for example with the tags being step0-2 and the branch being experiment1:

git checkout experiment1 #step0
compile, benchmark, plot
git checkout -b test step1
compile, benchmark, plot
git checkout -b test step2 #overwriting test?
compile, benchmark, plot
git branch -d test

But with this solution I can't easily change step 1, I'd need to start from the base and do it all over again, creating a new experiment branch (because I shouldn't commit between commits right?).

I also heard git branches should not be used for things not meant to be merged back. Am I over-complicating things? Is there a more obvious/simple way of managing this? Is the answer not git and I should use some other system?

4 Answers4

4

You shouldn't use git to handle this complexity. It would be a nightmare, the branches would diverge to far apart with time. You would be left with snippets of code that would all need to be separately managed.

A much more flexible solution would be to make the varying parts of your code base flexible. For instance, if you need the ability to change input parameters, make the parameters configurable on the command-line. This requires one binary, but you can run multiple tests.

If you need to implement separate algorithms/functions, you can make them configurable as well. Function pointers in C are a great way to do this, and this can be configurable from the command line also.

0

Old question, but an addendum answer.
There is an approach used for software "experiments" which I have found very useful. The article is "A Git workflow for code experiments", by Steve Brudz: https://medium.com/defmethod-works/quick-tip-git-workflow-for-code-experiments-82af10b1c5c4

Perhaps it could serve as a base for a more relevant answer to the question?

0

Separate code, configuration, and executions.

For larger experiments (weeks of experimenting vs. hours or days of experimenting). And running and develloping the experiments in batch (vs. a more interactive approach). I would suggest several separate repositories:

  1. The codebase of highly configurable code.
    Using your normal git branching practice.
  2. The parameters of the experiments

    • a config file(s) that the codebase understands
    • minimal version of the codebase that can run the config file; for example via a git submodule.

    Git history in this repo follows your lines of reasoning. It may contain many branches, that will not be merged. The history is less about reusing config files, the focus is on your line of thought.

    The ability to git diff between versions and branch can come in handy when diagnosing the cause of differences in behaviour between experiments.

  3. The results of the experiments.
    Including:

    • The configuration parameters of the experiment (again via a submodule).
    • The actual version of the codebase that ran the experiment (submodule).

    Git is not the ideal tool for this kind of data, but it works and has the advantages that software developers (including OP) are familiar with it.
    Git's history is not that relevant. Diffing can help, if the nature of the experiments and their results allows it. Branching is not that relevant, perhaps use just a linear single master branch history. Use the commit message to identify and find the right experimental results (see git log --grep=... https://stackoverflow.com/a/7124949/814206)

Optionally you may have:

  1. Report
  2. Source data the experiment should process.
0

For small experiments (hours to days of trying things) in an interactive setting (a REPL or quick edit, compile, run loop) you should strive for two things:

  1. Not disturb your flow of experimenting, making changes, and trying again; and,
  2. keep track of what you did.

For "edit, compile, run"-loops (ab-)using unit tests works well:

  1. Create a new test method with a name containing a short description of your experiment, a sequence number, and your initials.
  2. Write the experiment as a unit test method, for now, possibly without assertions or with wrong assertions.
  3. Compile & run, many IDEs allow quickly running a single test method in their unit test framework.
  4. Document the outcome of the experiment via assertions (adding or changing those from step 2). Resist the temptation to alter the experiment.
  5. Commit this experiment to git (single branch).
    (Pull the experiments from your team mate if needed. Hence the initials in the method name.)
  6. Rince and repeat (copying code from previous experiment test methods as needed)