Optimise Your Docker Image Builds in Gitlab CI

Preface

This is year 2022, and I am sure as ever that Docker does no longer need any introduction. And if you are maintaining a high functioning agile engineering team with Continous Integration and Continous Delivery practices, Gitlab should not be any unfamiliar name either. In this blog, I will show you few small yet effective steps to make best use of CI minutes to build Docker images more efficiently and save you some cost for better usage somewhere else.

Docker and Union File System

Before we jump into writing YAML for CI Pipeline, let us take a moment to introspect into Union File System at the core of Docker. If you are already well versed with this concept, feel free to skip to next section.

Union File System is a file system available in Unix and Unix-like (Linux and BSD) Operating System which implements Union Mount for one or more other file systems. It allows developer to build a file system by overlaying layers of files and directories along with permissions.

Gitlab CI

Gitlab CI is a CI tool ofeering available on all tiers of Gitlab. It can run on shared runners as well as allows us to configure our own runner instances. How to setup and configure a runner is out of scope of this blog, so I will leave it to the reader. For the rest of the discussion we will assume your runner is configured to run Docker Executor for CI Jobs.

Dockerfile

For the purpose of the demonstration, I will put forward a simple Express.JS application running in Node Environment that responds with β€œHello World” for all incoming request. And the implementation looks somewhat like this:

// app.js

const express = require("express");
const app = express();

app.all("/", (req, res) => {
res.send("Hello World");
});

app.listen(3000);

Now we want to containerize this application, so that we can run across multiple platforms with Docker Engine support seamlessly. So we come up with basic Dockerfile as such:

FROM node:16-alpine3.15
WORKDIR /
COPY package.json yarn.lock ./
RUN yarn install --frozen-lockfile --production=true
COPY app.js ./app.js

EXPOSE 3000
CMD ["node", "app.js"]

Docker Build

Now we can build this Docker image simply by running the following in our shell:

$ docker build -t hello-world .

This will create a Docker image in our local development machine with the tag name of hello-world:latest.

If we do a minor change in app.js, let's say add an Exclamation Mark after the text:

- res.send("Hello World");
+ res.send("Hello World!");

And we try to rebuild the image, we will see first few steps are completed almost instantly. That is because of the Caching Docker does during the build, by exploiting the advantage of Union File System.

-> Step [3/7] COPY package.json yarn.lock ./
- Using cache 24e08bf6

This, with no doubt, increases the speed of build process while keeping the process incremental and minimal.

CI Environment

Now, in CI environment, having the above advantage is somewhat complicated if not right away impossible sometime. This is because of the cleanup and reusablity of the Executors in between Jobs. These forces us to keep these machines stateless by nature, meaning no generated files and artifacts can live off the machine post job done. Not only this, but in case a build process takes advantage of a rotating set of build machines, it will be hard to get same machine continously to provide us with the cache.

Basic Job

First let us scratch out the basic strategy for the build process in .gitlab-ci.yml:

stages:
- build

build:
stage: build
image: docker:20.10.13-alpine3.15
services:
- docker:20.10.13-dind-alpine3.15
script:
- docker build -t hello-world .

Caching Docker FS Layers

Now to solve the problem of Caching, we can simply add the following steps, and voila! you have significantly faster builds:

services:
-docker:20.10.13-dind-alpine3.15
+ before_script:
+ - mkdir -p .docker
+ - "docker load -q -i .docker/fs.tar 2>&1 || :"
+ - rm -rf .docker
script:
- - docker build -t hello-world .
+ - docker build --cache-from=hello-world -t hello-world .
+ after_script:
+ - mkdir -p .docker
+ - "docker save -o .docker/fs.tar hello-world 2>&1 || :"
+ - docker system prune -a
+ cache:
+ - key: $CI_COMMIT_REF_SLUG
+ paths:
+ - .docker
+ when: on_success

Now, everytime during a re-run, Gitlab will first try to load Docker image from unpacked Tar archive file from the cache, and use it while building newer image. Every successful build will in turn create new archive and hence cache for next run.

Final Thoughts

It might seem like a good idea to go ahead and add this to every single build job in Gitlab that uses Docker. But keep this in mind, sometime it may happen that rebuilding from scratch is slightly faster and hence cheaper alternative than generating, uploading and then further downloading an archive.

Follow me on LinkedIn and/or Github for more

Originally published at https://dev.to on March 22, 2022.

--

--

Progyan πŸ‘¨πŸ»β€πŸ’» | #TheProDev

Software Engineer turned Architect | Upcoming Data Engineer | Start-up Advisor | Open Source | Philanthropist