Improve CI process by testing against docker image and fail docker build if test fails

Question

Our CI process goes as this cycle (I think it is quite normal), unit test => build docker image => run function test against the image => if test fails remove the failed docker image, figure out the reason and build a new docker image.

But checking the test result and removing the failed images has become a labor-intensive job so I have been thinking how to improve our CI process. I now design a process that if the container fails the smoke test (or even worst, fails to start), I will make docker build fails, so no more failed docker image.

But I face some technical difficulties, e.g. for my node.js project, if I just add RUN npm run start in Dockerfile, not as the entrypoint but in order to test it, and if it fails to run, docker build indeed fails. But if npm run start succeeds, docker build won't exit then. I have to design some kludge/clever way to bypass this.

I do some research about my "improved" CI process but I can't find others do that. So is there problem with that ? My goal is to reduce failed docker images. If this process does have problem(s) what other options can I reduce failed docker images?

We have dedicated docker build servers, so the docker build time is not a concern here and as why removing the failed images has become a labor-intensive job it may deserve another question.

PS. I searched the similar questions here I can only find these 3, they are not my question.

score 2 · Accepted Answer · answered Mar 24 '22 at 08:28

The Dockerfile isn't the right place to run tests. It should fail when the container can't be built for some reason, not when the software it contains somehow behaves incorrectly.

Instead, you should automate testing the created image and make pushing the image into your organization's docker registry dependent on a successful test. On unsuccessful tests, you can automatically remove the image from the build machine's local storage.

So your CI script should do something like this:

Build container image.
If unsuccessful, abort as failed.
Run test in the container image.
If unsuccessful, remove image and abort as failed.
Else push image to your registry and possibly remove it locally to avoid clutter.

Send notifications about failure/success as you see fit.

score 2 · Answer 2 · answered Mar 24 '22 at 16:26

I think your problem stems from the docker philosophy of compiling your code in a docker file.

This is a popular idea but its deeply flawed and should be avoided at all costs.

Separate these things in your pipelines:

building your code
creating a container image with the compiled code in
publishing that image to a container repository
deploying containers using the image from the repository

This gives you the flexibility you are looking for as you now have multiple points where you can test your compiled code and stop the pipelines while avoiding unwanted artefacts.

1.5 test the code before creating the image.

since you have the compiled code before you have even started making an image, you can spin it up, or dynamically link to it and run any tests you want.

Failing here means an image is never created

2.5 test after you have created an image but before its published.

You have an image with the compiled code, but you are still in a build pipeline. spin up a container using the local image and run tests against it

Failing here means the image is only ever local to your build pipeline and will presumably be destroyed when the pipeline completes.

3.5 You create the image and publish it to your repository, but its still not deployed.

You can deploy the image to a test environment and run tests against it

If it fails here you have published image in your repository, but you have not deployed it to live. Perhaps you could have two repositories and only promote to the live one if the tests pass

score 1 · Answer 3 · answered Mar 30 '22 at 07:46

As Hans-Martin Mosner said in his comments that the problem with my proposed process is dirty.

First, because I wanted to make sure my node app can start correctly, I didn't put npm start in background, I then needed to add some check in my startup logic that if this is a docker test run, exit the process after some specific times. Second, because npm start didn't exit I then need to add codes to run test after npm start. So my source codes and package.json both add ugly codes for the docker test run.

Because my goal is to reduce failed docker images, the better way to do it as he pointed out is to only push the docker image to my registry when it passes the test and delete failed image on local build server.

And I find GitHub action is a perfect place run my test. I am not sure if GitHub action using docker or not, but what I need is a clean environment to run/test my app and GitHub action fits the bill. So my CI process changes to only when I pass GitHub action I will build docker image and push it to my registry. Actually if GitHub action can push docker images directly to my registry I will only it then.

Improve CI process by testing against docker image and fail docker build if test fails

3 Answers3

Linked