When getting started with containers, it’s pretty easy to
be shocked by the size of the images that we build.
We’re going to review a number of techniques to reduce image
size, without sacrificing developers’ and ops’ convenience.
In this first part, we will talk about multi-stage builds,
because that’s where anyone should start if they want to
reduce the size of their images. We will also
explain the differences between static
and dynamic linking, as well as why we should care about that.
This will be the occasion to introduce Alpine.
In the second part, we will see some particularities
relevant to various popular languages. We will talk about
Go, but also Java, Node, Python, Ruby, and Rust.
We will also talk more about Alpine and how to leverage it
across the board.
In the third part, we will cover some patterns (and anti-patterns!)
relevant to most languages and frameworks, like using
common base images, stripping binaries and reducing asset size.
We will wrap up with some more exotic or advanced methods
like Bazel, Distroless, DockerSlim, or UPX. We will see how
some of these will be counter-productive
in some scenarios, but might be useful in others.
Note that the sample code and all the Dockerfiles mentioned
here are available in a public GitHub repository,
with a Compose file to build all the images and easily compare
their sizes.
→ https://github.com/jpetazzo/minimage
The English version of this series was initially published on
the Ardan Labs blog: parts
1,
2,
3.
A French version (translated by
Aurélien Violet and
Romain Degez) is also
available on the ENIX blog:
parts
1,
2,
3.
Enjoy your read!
What we’re trying to solve
Many people building their first Docker images that compile
some code are unpleasantly surprised by the resulting image sizes.
Look at this trivial “hello world” program in C:
/* hello.c */
int main () {
puts("Hello, world!");
return 0;
}
We could build it with the following Dockerfile:
FROM gcc
COPY hello.c .
RUN gcc -o hello hello.c
CMD ["./hello"]
… But the resulting image will be more than 1 GB, because it
will have the whole gcc
image in it!
If we use e.g. the Ubuntu image, install a C compiler, and build
the program, we get a 300 MB image; which looks better, but is
still way too much for a binary that, by itself, is less than 20 kB:
$ ls -l hello
-rwxr-xr-x 1 root root 16384 Nov 18 14:36 hello
Same story with the equivalent Go program:
package main
import "fmt"
func main () {
fmt.Println("Hello, world!")
}
Building this code with the golang
image, the resulting image
is 800 MB, even though the hello
program is only 2 MB:
$ ls -l hello
-rwxr-xr-x 1 root root 2008801 Jan 15 16:41 hello
There has to be a better way!
Let’s see how to drastically reduce the size of these images.
In some cases, we can achieve 99.8% size reduction (but we will
see that it’s not always a good idea to go that far).
Pro tip: to easily compare the size of our images, we are going
to use the same image name, but different tags. For instance, our
images will be hello:gcc
, hello:ubuntu
, hello:thisweirdtrick
,
etc. That way, we can run docker images hello
and it will list
all the tags for that hello
image, with their sizes, without
being encumbered with the bazillions of other images that we have
on our Docker engine.
Multi-stage builds
This is the first (and most drastic) step we can take to reduce the
size of our images. We need to be careful, though, because if
it’s done incorrectly, it can result in images that are harder
to operate (or could even be completely broken).
Multi-stage builds come from a simple idea: “I don’t need
to include the C or Go compiler and the whole build toolchain
in my final application image. I just want to ship the binary!”
We obtain a multi-stage build by adding another FROM
line in
our Dockerfile. Look at the example below:
FROM gcc AS mybuildstage
COPY hello.c .
RUN gcc -o hello hello.c
FROM ubuntu
COPY --from=mybuildstage hello .
CMD ["./hello"]
We use the gcc
image to build our hello.c
program. Then,
we start a new stage (that I will call the “run stage”)
using the ubuntu
image. We copy the hello
binary from the previous stage. The final image is 64 MB instead of 1.1 GB, so that’s about 95% size reduction:
$ docker images minimage
REPOSITORY TAG ... SIZE
minimage hello-c.gcc ... 1.14GB
minimage hello-c.gcc.ubuntu ... 64.2MB
Not bad, right? We can do even better. But first, a few tips and warnings.
You don’t have to use the AS
keyword when declaring your build
stage. When copying files from a previous stage, you can simply
indicate the number of that build stage (starting at zero).
In other words, the two lines below are identical:
COPY --from=mybuildstage hello .
COPY --from=0 hello .
Personally, I think it’s fine to use numbers for build stages
in short Dockerfiles (say, 10 lines or less), but as soon as
your Dockerfile gets longer (and possibly more complex, with
multiple build stages), it’s a good idea to name the stages
explicitly. It will help maintenance for your team mates
(and also for future you who will review that Dockerfile months
later).
Warning: use classic images
I strongly recommend that you stick to classic images for your “run”
stage. By “classic”, I mean something like CentOS, Debian, Fedora,
Ubuntu; something familiar. You might have heard about Alpine and
be tempted to use it. Do not! At least, not yet. We will talk about
Alpine later, and we will explain why we need to be careful with it.
Warning: COPY --from
uses absolute paths
When copying files from a previous stage, paths are
interpreted as relative to the root of the previous stage.
The problem appears as soon as we use a builder image
with a WORKDIR
, for instance the golang
image.
If we try to build this Dockerfile:
FROM golang
COPY h