5

I'm thinking about how to decide whether it's better to encapsulate my work behind well-named function names, or to expose it - which will help developers understand what's going on more quickly? Is there a name for the study of this sort of problem?

Specifically, if I'm running a bunch of bash commands ultimately, but I have significantly complex logic around those commands, at what point does it make sense to write this in a high-level language like Python, even though this obfuscates the actual bash commands being run?

Detailed problem

Currently I'm trying to write a Jenkins build script for my project with roughly the following steps:

  • Pull my code from github
  • Compile sass files into CSS
  • Pull down a sub-folder from a different github project
  • Zip up the project
  • Upload it to an object store with a unique ID

I'm thinking about how to write this to be as easy for future developers as possible (this code is never going to be seen by end users). These developers are likely, but not definitely, going to be fairly good at Python. They will definitely have a passing familiarity with the command-line, but are likely to be unfamiliar with more complex bash scripting.

The first iteration of this build script was just a list of sequential commands, something like:

git clone git@github.com:username/project.git
git clone git@github.com:username/sub-project.git project/sub-project
sass --update project/css
tar -czf project.tgz project
swift upload my-container project.tgz --object-name=project-`sha1sum project.tgz`.tgz

However, this set of commands quickly became more complex as I started to do things like only clone the git project if it wasn't already there, otherwise update it - to speed up the build. Before I knew it I had 50 lines and a fair few conditionals.

So the first thing I did was encapsulate these into bash functions, e.g. update_git_dir, so my build script looks more like this:

#!/usr/bin/env bash

source helper_functions.sh

update_git_dir project git@github.com:username/project.git
build_sass project/css
create_archive project project.tgz
upload_to_swift project.tgz

This is one level of encapsulation. Now the developer, who would have understood the git clone etc. commands directly, can't actually see what's going on. They have to look in helper_functions.sh.

However, as time went on I realised that many of my helper functions now consisted of more conditional statements, variable assignments and function calls than actual commands. These conditional statements can be quite opaque to someone not familiar with bash scripting:

function create_archive {
    project_name=${1}
    archive_filename=${2}

    # Get revision ids
    dependencies_requirements_revision=$(cat ${project_name}/sub-project/requirements-revision.txt)

    requirements_context=${project_name}/${requirements_file}
    requirements_dir=$(dirname ${requirements_context})
    if [ "${requirements_dir}" != "${project_name}" ]; then
        requirements_context=${requirements_dir}
    fi
    latest_revision=$(git-revision-hash ${project_name})

    ...

So I started migrating my code into Python. So now my build script looks like this:

#!/usr/bin/env python

from builders import GitProjectBuilder

builder = GitProjectBuilder(
    project_name='my-project',
    swift_container='my-container',
    git_repository='git@github.com:username/project.git',
    sub_project='git@github.com:username/sub-project.git'
)

# Compress and upload
builder.build_sass(directory='css')
builder.get_sub_project(repo='git@github.com:username/sub-project.git')
builder.build_archive(name='archive.tgz')
upload_location = builder.upload_archive_to_swift(archive='archive.tgz')
print upload_location

Now, when you look in builders.py, it's much easier to understand the logic - if statements and function calls are much more readable - but now we're even further away from the real shell commands. In my python code the closest I get to directly running shell commands looks like this:

def build_archive(self, archive):
    print subprocess.check_output(
        (
            'tar --exclude-vcs --create --file '
            '{archive_filename}.tar {project_dir}'
        ).format(
            archive_filename=archive_filename,
            project_dir=self.project_name
        ).split()
    )

If the developer needs to work out exactly which commands are being run, it's now much more difficult.

Wrap up

So how do I decide which is the best architecture to maximise transparency while encapsulating complexity?

This problem seems similar to when I'm working with dependency injection where the more dependencies I inject rather than encapsulate, the more complex my initialisation code gets - and I have a similar problem drawing the line.

Is there a name for this field of study?

Robin Winslow
  • 533
  • 4
  • 14

4 Answers4

2

I'd give xonsh a go, it's a clever mix of shell and python.

xonsh is a Python-ish, BASHwards-compatible shell language and command prompt. The language is a superset of Python 3.4 with additional shell primitives that you are used to from BASH and IPython. xonsh is meant for the daily use of experts and novices alike.

Take advantage of Python(3)'s abstraction and package system, coupled with nice conditionals, but write what needs to be in shell as just shell.

e.g.,

#!/usr/bin/env xonsh

def exists(filename): return filename in $(ls)

if exists(".git"): git checkout master git pull else: git clone $GITURL

Note that only a little bit of ugliness $() is required to in-line shell inside of python, and it Just Works (TM) if you are splitting things clearly by line (eg the if statement lines)

Lots more detail (including embedding python in shell lines with @() ) in the tutorial http://xonsh.org/tutorial.html

You can use it as your system shell. But just because you can doesn't mean you should :-)

0atman
  • 121
1

I will not provide with an immediate yes/no answer, but some thoughts on the situation.

Build scripts since many people depend on them should be the most easy to understand area of your code. I would argue that a long "boring" bash should not be an issue as long as it is easily understood. I'd add a hint towards the "configure && build && build install" of the C in various unices.

Your bash script seems to be doing variable initialization and defaults assignment, none of those would get into a deep if-then-else structure.

An estimation of bash scripts size (although it is surely debatable) is that if it is more than 100 lines of code than it might need to be written into a "proper" program. Although the previous sentence is a matter of opinion.

If you decide not to go through the bash route, then you need to get into build tools who are made just for this purpose. Ant / Maven / Gradle in the Java world and many others for different platforms. I can kind of see your example as a series of build targets for some tools I've used in the past, as a sequence of Rake tasks or bazel ones.

I assume that if you go that route then it should be one in the most used language of the project if possible (easier to maintain).

I do not know if there is a name for the field, but someone doing this on day to day basis is called "integration engineer" in some companies.

1

If I were in your position, I would use http://paver.github.io/paver/ It's basically Rake for python.

Bon Ami
  • 317
  • 2
  • 7
0

Rather than fret about how much abstraction is too much for a build/integration tool, instead opt for a tool commonly used by Python developers to do this sort of thing. You'll get far more use out of a well known, but highly abstracted build tool than you will from the clearest written, simplest build tool that does the same thing.

What is more important to understand: Knowing the gritty details about how a gzip file is created in Bash, or that these files are gzipped together and all manners of chaos and panic will occur if it is omitted?

I would recommend searching for Python build tools. Ask the python community what they recommend and choose a open source project. It probably has more features than you've built, and it probably has been tested more completely.