Basic Tutorial: Using Docker and Python
This isn't intended to be an indepth tutorial on Docker or Flask. Both tools have excellent documentation and I would highly suggest you read it. A quick brief of Docker is this, Docker allows you to bundle all your application dependencies into a portable container that can be ran on any machine with a container runtime. This allows you to simplify your infrastructure by only having the necessary docker components installed and not have to worry about a specific python/node/java version being installed. They are installed in the container image. The container image is defined by a series of directives in a Dockerfile
. This Dockerfile
is what we will be writing in this post. I will try and explain why I write my Dockerfile
a certain way and if you have any questions feel free to ask.
The Dockerfile
will look like this.
FROM python:3.9-slim-buster
RUN apt-get update && apt-get install -y \
build-essential \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir /code
WORKDIR /code
COPY requirements.txt .
RUN python3.9 -m pip install --no-cache-dir --upgrade \
pip \
setuptools \
wheel
RUN python3.9 -m pip install --no-cache-dir \
-r requirements.txt
COPY . .
EXPOSE 5000
CMD ["python3.9", "app.py"]
The first like FROM python:3.9-slim-buster
determines what image we're inheriting from. I went with 3.9-slim-buster
instead of 3.9-alpine
. While Alpine starts as a smaller image (44.7MB vs. 114MB), it can sometimes be hard to find compiled binaries. This can cause the image to have to build the binaries themselves. You may end up having to install git
and other tools in order to achieve this, which will increase the image size. Also compiling from source can make the build take a while. The slim-buster
image is a nice inbetween. I rarely get over a 1 gig image size.
We then will install any underlying binaries we need with the following directive.
RUN apt-get update && apt-get install -y \
build-essential \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
build-essential
will get us a C compiler and other stuff in order to install Python packages with C extentions. In our case psycopg2
. We will also install libpq-dev
as well. The && rm -rf ...
will clean up apt-get
for us to minimize image size. This has to be in the same directive. If you were to write it as follows.
RUN apt-get update && apt-get install -y \
build-essential \
libpq-dev
RUN rm -rf /var/lib/apt/lists/*
You would not actually reduce the image size as the package info will be removed in the later layer but still exist in the previous layer. If the image is squashed I believe this way of writing the image is fine. I am not particularly familiar with this process but to my understanding it consolidates all the layers and thus the files would actually be removed. I've never done this so I cannot give an informed opinion on squashing images.
The next two directives simply create a directory and make it our working directory. There is not much to say here.
RUN mkdir /code
WORKDIR /code
We then bring in our code and install our various pip
dependencies.
COPY requirements.txt .
RUN python3.9 -m pip install --no-cache-dir --upgrade \
pip \
setuptools \
wheel
RUN python3.9 -m pip install --no-cache-dir \
-r requirements.txt
COPY . .
The order of these matter. Docker uses caching to determine whether or not to build a layer. When a layer invalidates the cache the subsquent layers will also be rebuilt. For the COPY
directive the cache is calculated by the checksum of the files being copied. If you were to have the series of directives as follows.
COPY . .
RUN python3.9 -m pip install --no-cache-dir --upgrade \
pip \
setuptools \
wheel
RUN python3.9 -m pip install --no-cache-dir \
-r requirements.txt
Both the subsequent layers from the RUN
directives would always be rebuilt and thus slow down the build. Reality is you only need to upgrade pip
, setuptools
, and wheel
when you have new dependencies to install. Same with the actual install of the requirements.txt
file. By only copying over the requirements.txt
file we will only do those time consuming steps when our requirements.txt
file actually changes. From there we simply copy in our new code, expose a port and define a command to be ran when the container is ran. Finally when installing pip
dependencies you should use the --no-cache-dir
flag as it prevents pip
from downloading the packages to a cache in the event you want to quickly install again. This is unnecessary in the docker image and thus takes up space.
The way the file is written only the last 3 layers will be rebuilt on subsequent builds of the image. Thus rebuilding the image rather quickly. Go ahead and build the docker image with both ways and see the difference (you might have to make code changes for docker to pick up a checksum change). Also the caching determines which layers need to be pushed to a Docker image repository as well as which layers need to be pulled down to the host machine running the application on deployment. The way we wrote this Dockerfile
makes it so that again only the last 3 layers, which are tiny, get pushed to the repo and pulled down to the application host, thus deploying quickly. This is a best case scenario. Obviously if you're trying to push changes that also change the requirements.txt
those layers will also need to be pushed.
Our app.py
will look like this
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello_world():
return 'Hello World!'
if __name__ == '__main__':
# the /etc/hosts in docker containers doesn't like 127.0.0.1
# so use 0.0.0.0 instead.
app.run(host="0.0.0.0")
And requirements.txt
will look like this.
flask
psycopg2
sqlalchemy
Obviously our code isn't currently using any database but I included some to showcase a realistic example.
To build and run you can use the following commands.
$ docker build -t flask-docker .
$ docker run -it -p 5000:5000 flask-docker
I hope this post was helpful in understanding certain ways to write Dockerfiles
for you Python app. I used flask
in this example but it isn't particularly different for Django
. Basically only the CMD
directive would change. Also for both you'd likely want to use uwsgi
or gunincorn
to actually run the webserver once deployed with NGinx or Apache as a reverse proxy.
I find the Dockerfiles best practices section of the Docker documentation to be really helpful.
Awesome tutorial and explaination. This will help people to start with. Thanks a lot for this article.
Thank you so much for the tutorial
https://bit.ly/2VCg1Dg <a href="
http://bit.ly/2VCg1Dg">bit.ly/2VCg1Dg</a>
Thank you so much for the tutorial.