Faster CI/CD pipelines with Docker
The problem
One of the key techniques to moving fast in software development is continuous delivery. However, looking after build agents is a chore. Docker helps with this by keeping the build agent simple and putting all the build tooling inside the Docker image. However, this brings with it a new problem - speed. Building dependencies on a new build agent can take quite a while, especially in Java. This article shows you how to speed up your builds with some Docker features.
What's different about this approach?
Using the layer system in Docker, we can separate dependencies for our code into a different part of the Docker image than the layer in which the code sits. We can also utilise a new feature in Docker that lets us make use of a previously saved image as the layer cache of the current image. This way we speed up the build when only the code (and not the dependencies) have changed.
Tech stack
We'll cover examples in
- Ruby
- Python
- Java
- JavaScript
All will use Docker and we'll throw in AWS services like CodePipeline, CodeBuild and Elastic Container Registry (ECR) as a simple example of how to get a build pipeline running.
The nitty gritty
Different programming languages have different package dependency management systems:
For the purposes of installing dependencies, they all do about the same thing:
- look in a configuration file for the dependencies you've declared
- go and find the packages for those dependencies in the official repository
- download them to your local system
- make them available to your code
For example, if I want to use Redux in my JS React app, I can yarn add react-redux
and I'll end up with package.json
file containing a reference to react-redux (as well as a local install in my node_modules). Since I don't want to rely on people remembering to do these installs correctly when deploying to servers, I don't store all the dependencies in git; I just store the package.json
file. If someone else gets my code from GitHub, they can yarn install
and they'll get redux (along with all the other stuff in the package.json
file).
Here is the issue; downloading and building those dependencies out on a build server often takes a long time. Even if I have them installed from a previous build, it's common to scale up build agents in times of demand and then destroy them later for cost effectiveness. This means my build agents are always new and have to do fresh builds.
Docker to the rescue
Docker has two features that help:
- image layers
- the
--cache-from
option in the build command
If we take Java as an example, we can use the following Dockerfile.tests
to run our tests:
FROM openjdk:8-jdk-alpine
RUN apk update
RUN apk add maven
WORKDIR /opt/code
COPY ./pom.xml .
RUN mvn dependency:go-offline
COPY . .
This means the dependencies come in with RUN mvn dependency:go-offline
but the rest of our code comes in with COPY . .
(so the dependencies stay in the previous layer).
We can then provide a CI config that uses docker build
with --cache-from
to ensure we only ever have to run the dependency layer when the dependencies change.
As an example, let's describe a couple of CodeBuild configurations. If we assume we have an ECR repo set up at 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img
, we can configure a few steps in CodePipeline:
- Step 1: get the source from source control (e.g. GitHub)
- Step 2: use a standard CodeBuild container to build the test docker image
- Step 3: use the new test image directly in CodeBuild to run the tests
Step 1
Create a GitHub webhook via CodePipeline. This can be done in the console, via the cli or with CloudFormation
Step 2
This is an example buildspec that logs in to ECR, pulls the last test image, builds the new test image using the last one as a cache and pushes the new one to ECR:
phases:
pre_build:
commands:
- $(aws ecr get-login --no-include-email --region ap-southeast-2)
- docker pull 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img:latest
build:
commands:
- docker build --tag 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img:latest --file Dockerfile-tests --cache-from 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img:latest .
post_build:
commands:
- docker push 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img:latest
Step 3
The buildspec to run the tests in the next CodePipeline step is simple if the step is configured to use the 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img:latest
image we just created:
phases:
build:
commands:
- mvn test
The whole process is similar for other languages and package management systems. Just put the package file and install commands before the rest of the code.
Ruby:
COPY ./Gemfile .
RUN bundle install
COPY . .
Python:
COPY ./requirements.txt .
RUN pip install -r requirements.txt
COPY . .
JavaScript:
COPY ./package.json .
RUN yarn install
COPY . .
etc.
Final thoughts and next steps
Doing this saves a lot of maintaining build agents and a lot of time in builds. It works equally well with any other build system that can run docker containers, such as BuildKite. If you need help with your setup, get in touch.
About me
I'm a Principal Engineer, with programming experience in Java, Python, Ruby, JavaScript and C#, with rusty recollection of LabVIEW, C++, VisualBasic and ColdFusion. I've dabbled in Haskell.
I have deep AWS experience and some knowledge in Azure and GCP.
Appendix - Tech mentioned in this post
Docker
Docker is a way to package applications into a container that includes all the files necessary to run the application, including operating system files, but not the operating system kernel. In contrast, a virtual machine (VM) contains a kernel and virtualised hardware interfaces.
Amazon Web Services
AWS is a cloud services provider, where computing power, networking and other services are provided on-demand. It allows for infrastructure as code and helps teams spend time on solving customer problems rather than looking after datacentres.
👍