Smaller Docker images by using a blanket ignore / Marijke Luttekes

You may know and love Docker, but do you know you are probably stuffing too many files into your images? I know I was. There is a remarkably simple solution, though: ignoring all files by default.

I will not explain how Docker works; that is what the official documentation is for. Instead, we will focus on this nifty trick using the .dockerignore file.

If you need it, here is the link to the official .dockerfile documentation.

Note: This article will collectively refer to files and directories as "items".

Why do we use .dockerignore?

Docker ignore files are great for excluding specific files and directories from your images.

You can exclude items for several reasons, which typically boil down to their presence not being needed in a production build. They could be files that are environment-specific, dev-only, or whose presence poses a security risk.

Even if a file is harmless in an image, it's better to leave it out unless needed. Indeed, you can cram everything in your image, but that would only make it bigger and slower to load.

(I am looking at you, node_modules!)

Of course, I won't stop you if you want to take your snow boots on a beach holiday. You do you.

What to exclude?

The items to exclude differ per Dockerfile. You can have different Dockerfiles per project, each with a separate ignore file. This is convenient when you have different setups for production and development.

Production setups only need the files required to run your application; they don't need any development tools.

In the past, I would manually exclude files like IDE files, test scripts, local databases, coverage settings, linter configurations, and more. At some point, I found myself ignoring too many files by hand.

There are two ways I fixed this:

Only copy files from the src directory to the Docker image, and move as many dev-only files as possible to the project root (the parent of src).
Docker-ignore all the things by default.

We will focus on the second point for the rest of this article.

Ignoring all the things by default

I am a lazy developer, and keeping the .dockerignore file up to date with every file addition to a repository is… not going to happen. I needed a better solution.

Blanket ignore-everything to the rescue! That, and Mauricio Sánchez's answer on StackOverflow.

Instead of explicitly excluding items, we start explicitly allowing files.

Note: I only use ignore files for production environments, as my development environment mounts the src directory to the web container. Hence, this is production-centric. You can make a more lenient ignore file for development if needed.

The basic ignore file looks like this:

# Ignore all files and directories by default
*

# Allow these files and directories
# !...

# Ignore specific files in allowed directories
# ...

# Ignore unnecessary files inside allowed directories
# This should go after the allowed directories
**/*~
**/*.log
**/.DS_Store
**/Thumbs.db

Let's break down the four sections of this file:

Section 1: The blanket ignore rule. This rule is our fallback to ensure that no file enters the image without our consent.

Section 2: The list of allowed files and directories. These are the items that will be included in the image. Take note of each line starting with an exclamation mark (!), which tells Docker not to ignore the file.

Section 3: This is a specific ignore list where you can tune the rules from the previous section. If one of the allowed directories contains an item you want to exclude, put it here.

Section 4: Some (OS-specific) directory files that should never make it into your image. Credits go to the aforementioned StackOverflow answer.

Example

What could this file look like in reality? Let me show you using my personal website as an example.

My website runs on Django; the code is under src. After years of using this project as a playground for tools and experiments, the repository has acquired several quality configuration files and cache directories.

The layout of the root directory is as follows:

.
├── .DS_Store
├── .dockerignore
├── .editorconfig
├── .env
├── .env.example
├── .git
├── .gitignore
├── .gitlab
├── .gitlab-ci.yml
├── .pre-commit-config.yaml
├── .ruff_cache
├── .tox
├── .vscode
├── Dockerfile-dev
├── Makefile
├── README.md
├── _bin
├── dev-cache
├── docker
├── docker-compose.dev.yml
├── pyproject.toml
├── requirements
├── runtime.txt
├── setup.py
├── src
├── tox.ini
└── venv

And the src directory looks like so:

src
├── .DS_Store
├── .coverage
├── .coveragerc
├── .ipynb_checkpoints
├── .mypy_cache
├── .stylelintrc.json
├── Makefile
├── __pycache__
├── coverage.xml
├── db.sqlite3
├── manage.py
├── marijkeluttekes
├── node_modules
├── notebooks
├── package.json
├── webpack.config.js
└── yarn.lock

Almost every item you see listed here is not needed in production. Ironically, even the .dockerignore itself isn't needed in this project (it's unused), but it works for the purpose of this article.

Django only needs the directory src/marijkeluttekes and the file src/manage.py. The hosting needs setup.py and runtime.txt. For static file compilation, we need package.json and yarn.lock.

The requirements directory contains multiple raw requirement files, but builds only need compiled lock files such as requirements.txt.

There is also a local settings override module within src/marijkeluttekes/settings that I don't want to include with the build.

This results in the following Docker ignore file:

# Ignore all files and directories by default
*

# Allow these files and directories
!/requirements/*.txt
!/src/marijkeluttekes
!/src/manage.py
!/src/package.json
!/src/yarn.lock
!/runtime.txt
!/setup.py

# Ignore specific files in allowed directories
/src/marijkeluttekes/settings/local.py

# Ignore unnecessary files inside allowed directories
# This should go after the allowed directories
**/*~
**/*.log
**/.DS_Store
**/Thumbs.db

Wrap-up

You and your teammates are more likely to forget to ignore than to remember to allow items. That's all fun and games until you're compiling sensitive data or large directories into your images.

Give yourself an easier time building Docker images by ignoring every file and directory by default.