With the rise in popularity of microservices, Docker has gained popularity and has become the standard for containerization. Its ease of use in software delivery using containers has put it ahead of other containerization technologies such as rkt and LXD.

I’ve lost count of the times I’ve come across popular official Docker images that were bloated and riddled with bugs and known vulnerabilities. This causes many developers to shun them in favor of building their own alternate images, causing duplication of effort, developer time and storage space and costs. And even though these alternate images are better, they’re not as popular as the official ones, which leads to a lower adoption of good-quality images. Meanwhile, I’m sure a large percentage of the developer community still relies on the inferior images.

This blog post focuses on tackling the bloating in Docker images and I’ll cover the security aspect in a future post.

Why should we care about smaller images? Won’t we waste precious development time creating smaller images? There are many valuable benefits to having smaller Docker images, and when the usage is scaled, the benefits outweigh the cost of development time:

  1. they’re usually faster to build and push to a repository, leading to faster Continuous Integration (CI)
  2. they’re faster to deploy, enabling better Continuous Deployment/Delivery (CD)
  3. they have a significantly smaller attack surface since there isn’t much inside the image other than the application being distributed
  4. they save disk space and improve I/O performance, which is important especially for popular images as the total disk space and I/O time saved across all usage is immense
  5. they result in better network performance by conserving bandwidth and improving latency, and these savings scale up with the popularity and usage

So how do we make our Docker images smaller? Let’s containerize a simple “Hello, World!” program written in C. I’m not going to optimize the program or image for performance or security right now, but I will be diving into Docker performance and security in future posts.

/* hello.c */
#include <stdio.h>

int main() {
    printf("Hello, World!");
    return 0;
}

1. Use gcc as the base image

Let’s create a Dockerfile in the same directory as hello.c with the following content:

# Dockerfile
FROM gcc
COPY hello.c .
RUN gcc -o hello hello.c
CMD ["./hello"]

To build the image and tag it as hello-world:gcc, run the command docker build -t hello-world:gcc . from the same directory.

Once the build is finished, run docker run hello-world:gcc and "Hello, World!" shows up in the console output.

To view the image’s size, run docker images hello-world:gcc | awk '{ print $NF }', and it shows 1.19GB. Embarrasingly huge for a “Hello, World!” program. Running docker images gcc | awk '{ print $NF }' shows that the base image’s size is 1.19GB as well. Clearly, using a popular Docker image like gcc as a base image blindly isn’t a good idea here.

2. Use a smaller base image

Let’s update the Dockerfile to use a smaller base image, such as ubuntu, which is again very popular. Since ubuntu doesn’t come with a C compiler, we’ll need to install one.

# Dockerfile
FROM ubuntu
COPY hello.c .
RUN apt-get update
RUN apt-get install -y gcc-10
RUN gcc-10 -o hello hello.c
CMD ["./hello"]

Run these commands to build the image and run it as a container:

$ docker build -t hello-world:ubuntu .
...
$ docker run hello-world:ubuntu
Hello, World!
$ docker images hello-world:ubuntu | awk '{ print $NF }'
SIZE
244MB

By using a smaller base image, we’ve brought the size down to 244MB, which is much better than 1.19GB, but still not good enough.

3. Use fewer layers

Docker caches image layers and reuses them to save time when building and pulling images. For example, if I already have the latest ubuntu base image, then while running the first step FROM ubuntu while building the hello-world:ubuntu image, Docker won’t need to fetch the base image again.

Reducing the number of layers is one way to ease caching, which often speeds up the build time drastically:

# Dockerfile
FROM ubuntu
COPY hello.c .
RUN apt-get update && \
    apt-get install -y gcc-10 && \
    gcc-10 -o hello hello.c
CMD ["./hello"]

Run these commands to build the image and run it as a container:

$ docker build -t hello-world:ubuntu-2 .
...
$ docker run hello-world:ubuntu-2
Hello, World!
$ docker images hello-world:ubuntu-2 | awk '{ print $NF }'
SIZE
244MB

As expected, the size is still 244MB. However, our image now has 4 layers instead of 6.

4. Use multi-stage builds

Since we eventually want to only run the binary hello, having a C compiler and all its dependencies in the container isn’t useful. Let’s use the multi-stage build feature of Docker to build the binary in one image and copy it to another image, and that way we add only the binary to the second base image’s size in the final image.

# Dockerfile
FROM gcc as build
COPY hello.c .
RUN gcc -o hello hello.c

FROM ubuntu
COPY --from=build hello .
CMD ["./hello"]
$ docker build -t hello-world:ubuntu-multi .
...
$ docker run hello-world:ubuntu-multi
Hello, World!
$ docker images hello-world:ubuntu-multi | awk '{ print $NF }'
SIZE
73.9MB

That’s a great improvement, but let’s keep pushing.

5. Use the alpine base image

Now we’ve shrunken the image down to 73.9MB, but we can still shrink it further by using a base image smaller than ubuntu. Alpine is one popular minimal Docker image and its size, as of writing this, is just 5.57MB.

# Dockerfile
FROM gcc as build
COPY hello.c .
RUN gcc -o hello hello.c

FROM alpine
COPY --from=build hello .
CMD ["./hello"]
$ docker build -t hello-world:alpine .
...
$ docker run hello-world:alpine
standard_init_linux.go:211: exec user process caused "no such file or directory"
$ docker images hello-world:alpine | awk '{ print $NF }'
SIZE
5.59MB

Uh-oh! Let’s dive into the container and inspect further.

$ docker run -it hello-world:alpine /bin/sh
/ # ls -al hello
-rwxr-xr-x    1 root     root         16376 Aug 17 00:09 hello
/ # ./hello
/bin/sh: ./hello: not found
/ # ldd hello
    /lib64/ld-linux-x86-64.so.2 (0x7f7deccd3000)
    libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f7deccd3000)

So it looks like the binary can’t be run because libc.so.6, which it depends on, isn’t available on the alpine image. We can build our binary as a static one with the library embedded in it.

# Dockerfile
FROM gcc as build
COPY hello.c .
RUN gcc -o hello hello.c -static

FROM alpine
COPY --from=build hello .
CMD ["./hello"]
$ docker build -t hello-world:alpine .
...
$ docker run hello-world:alpine
Hello, World!
$ docker images hello-world:alpine | awk '{ print $NF }'
SIZE
6.52MB
$ docker run -it hello-world:alpine /bin/sh
/ # ls -al hello
-rwxr-xr-x    1 root     root        945088 Aug 17 01:08 hello

hello-world:alpine is an order of magnitude smaller than hello-world:ubuntu and we’ve only increased the size of the image by approximately 0.95MB, which is the size of the static binary hello.

There are some points to keep in mind when using alpine. Most importantly, alpine uses musl libc as the C standard library instead of the more common glibc. You can find the differences between musl libc and glibc here.

What if we used alpine as our build image? We won’t need to build a static library in that case. We’d have to install a C compiler though. Let’s try it out:

# Dockerfile
FROM alpine as build
COPY hello.c .
RUN apk add --no-cache gcc musl-dev && \
    gcc -o hello hello.c

FROM alpine
COPY --from=build hello .
CMD ["./hello"]
$ docker build -t hello-world:alpine-dynamic .
...
$ docker run hello-world:alpine-dynamic
Hello, World!
$ docker images hello-world:alpine-dynamic | awk '{ print $NF }'
SIZE
5.59MB

6. Use the busybox base images

Another popular tiny base image is busybox but the drawback is that, just like alpine, we’ll have to build a static binary when using it.

# Dockerfile
FROM gcc as build
COPY hello.c .
RUN gcc -o hello hello.c -static

FROM busybox
COPY --from=build hello .
CMD ["./hello"]
$ docker build -t hello-world:busybox .
...
$ docker run hello-world:busybox
Hello, World!
$ docker images hello-world:busybox | awk '{ print $NF }'
SIZE
2.17MB

That’s around a third of the size of hello-world:alpine, but still not good enough for a container that holds a small program. Let’s trying something else.

There’s a busybox:glibc image that comes bundled with glibc, and we don’t have to build a static binary if we use it:

# Dockerfile
FROM gcc as build
COPY hello.c .
RUN gcc -o hello hello.c

FROM busybox:glibc
COPY --from=build hello .
CMD ["./hello"]
$ docker build -t hello-world:busybox-glibc .
...
$ docker run hello-world:busybox-glibc
Hello, World!
$ docker images hello-world:busybox-glibc | awk '{ print $NF }'
SIZE
5.22MB

It makes sense that the size is larger because busybox:glibc contains glibc and other tools. It’s smaller than alpine and is beneficial to use as a base image when building dynamic binaries, but we’d have to copy any additional libraries to it from the build stage image.

7. Build from scratch

As we saw above, having the additional dependencies such as glibc and tools that come with busybox is an overhead if we don’t plan to use it. For example, if we plan to restrict access to containers in a production environment and not allow anyone to log in to them, then we don’t need to have anything other than our application in the image.

The way to do it is to build an image FROM scratch:

# Dockerfile
FROM gcc as build
COPY hello.c .
RUN gcc -o hello hello.c -static

FROM scratch
COPY --from=build hello .
CMD ["./hello"]

The Dockerfile above is for building an image that contains a statically-linked binary hello and nothing else.

$ docker build -t hello-world:scratch-glibc .
...
$ docker run hello-world:scratch-glibc
Hello, World!
$ docker images hello-world:scratch-glibc | awk '{ print $NF }'
SIZE
945kB

The entire 945KB size of the hello-world:scratch-glibc image is due to the hello binary. We’ve come a long way in reducing the image size - from 1.19GB to 945KB. But there’s one more trick! Notice that we used glibc in the above image. If we use musl libc, then we can reduce our binary’s size even more. Here’s the Dockerfile for it:

# Dockerfile
FROM alpine as build
COPY hello.c .
RUN apk add --no-cache gcc musl-dev && \
    gcc -o hello hello.c -static

FROM scratch
COPY --from=build hello .
CMD ["./hello"]

Let’s build it and see how small it is.

$ docker build -t hello-world:scratch-musl .
...
$ docker run hello-world:scratch-musl
Hello, World!
$ docker images hello-world:scratch-musl | awk '{ print $NF }'
SIZE
96.3kB

That’s almost a tenth of the size of the previous one, and 0.0077% of the one we build using the gcc base image.

Comparison

Image Size
gcc base image 1.19GB
ubuntu base image 244MB
ubuntu base image, multi-stage build 73.9MB
alpine base image, multi-stage build, statically-linked using glibc 6.52MB
alpine base image, multi-stage build, statically-linked using musl libc 5.59MB
busybox:glibc base image, multi-stage build 5.22MB
busybox base image, multi-stage build, statically-linked using glibc 2.17MB
scratch base image, multi-stage build, statically-linked using glibc 945KB
scratch base image, multi-stage build, statically-linked using musl libc 96.3KB