On docker — running, monitoring, administering, and cleaning images and containers
Published on Sep 10, 2020 by Impaktor.
The greatest thing in the world is to know how to belong to oneself
Michel de Montaigne, The Complete Essays
Table of Contents
1. Introduction
These are my notes on Docker, compiled from a tutorial, arch wiki, Jérôme Petazzoni’s PyCon 2016 talk and possibly other places.
See also this post on VMs vs Docker images on QEMU MicroVMs.
1.1. Definition: “Images” & “Containers”
One can have multiple containers from a single image
- images are conceptually similar to classes
- layers are conceptually similar to inheritance
- containers are conceptually similar to instances
2. Getting started
2.1. Installing docker
start it manually for now:
sudo systemctl start docker.service
…or add it for each startup:
sudo systemctl enable docker.service
check your groups
groups
add yourself to docker group
gpasswd -a <user> docker
test that you can run docker as normal user:
docker info docker version
2.2. Basic docker commands
Images are by default stored in: /var/lib/docker
, but can be changed see
further down.
To search available images to download:
docker search <image name>
e.g. install an arch docker:
docker pull archlinux/base
Run interactive (-i
) and in terminal (-t
):
docker run -i -t ubuntu
installs the docker image and drops you in the docker image (exit
to leave)
In general:
docker run [image name] [command to run]
shut down running docker image
docker stop [container ID]
or more brütal:
docker kill
docker attach [container ID]
remove docker image
docker rm [container ID]
2.3. Setting up a docker image
- Doing it manually / interactive
Start a docker image:
docker run -it ubuntu
now in said docker image:
apt-get update apt-get install <needed packages> exit
show diff from base image:
docker diff <container ID>
commit container:
docker commit <container ID> docker commit <container ID> <desired name of image>
which returns an sha256 hexadecimal string. NOTE: commit only saves state of file system, not running processes.
Launch it, and check that it has the installed files
docker run -ti <returned image ID>
or tag it to alias:
docker tag <returned image ID> <desired name of image>
- Doing it using Dockerfiles
Instead of using commit to create a docker container, we can use the build command, which takes a docker-file that automates what we did manually above. Create file
Dockerfile
(default name it looks for) in folder (optional name) with content:FROM ubuntu RUN apt-get update RUN apt-get install figlet
execute, new image with tag “Figlet”
docker build -t Figlet <path to folder with dockerfile>
good to have:
docker history Figlet
run figlet command from figlet container:
docker run figlet figlet hello
can add the command directly in the container file:
FROM ubuntu RUN apt-get update RUN apt-get install figlet CMD figlet hello
and will only then need:
docker run figlet
if I still want to get into the container:
docker run -ti figlet bash
Using
ENTRYPOINT
allows passing arguments to docker, so the container behaves like a binary file, or script that can take arguments. If using ENTRYPOINT and CMD together, both must use json syntax(Taken from https://www.youtube.com/watch?v=ZVaRK10HBjo&t=1h45m)
2.4. Naming containers
Docker gives names by default, see docker ps -a
or we can give a name
docker run --name ticktock python
(or docker rename
)
and then use:
docker logs ticktock docker stop ticktock docker inspect ticktock docker inspect --format '{{ .State.Running}}' ticktock
(This was taken from yt: Introduction to Docker and containers - PyCon 2016)
2.5. Monitoring
Show running docker processes:
docker ps
Show last, show only id (useful for scripting)
docker ps -l -q
Show also (all) non-running:
docker ps -a
Show status (CPU/RAM, etc) on running images
docker stats docker stats --no-stream docker stats <list container IDs>
Or also include stopped containers
docker stats --all
Common usage for “logs” (shows output of container, e.g. if running
daemonized with run -d
, or to publish all ports exposed in the image:
run -P
)
docker logs --tail 10 --follow <container ID>
There are three networks by default:
docker network ls
e.g.
docker network inspect bridge
“The bridge network corresponds to the docker0 network, which is present in all Docker installations. The none network doesn’t have any access to the external network, but it can be used for running batch jobs. Finally, the host network adds a container on the host’s network stack without any isolation between the host machine and the container.”
From linuxhint. Note, if you need two docker containers to communicate with each other, or other connectivity issues with docker containers, see the article Connection refused? Docker networking and how it impacts your image
Often one might develop a docker image to run on a VM. Then we need to know how much RAM to configure the VM for. To monitor RAM usage, to file, of a running docker image, for plotting later on:
while true; do docker stats --no-stream --format "{{.Container}}:\t{{.CPUPerc}}\t{{.MemUsage}}" | tee --append stats-training.txt; sleep 1; done
Gives output:
ab5fade5cf45: 0.05% 547.5MiB / 15.67GiB ab5fade5cf45: 0.01% 547.5MiB / 15.67GiB ab5fade5cf45: 0.01% 547.5MiB / 15.67GiB ab5fade5cf45: 264.85% 1.023GiB / 15.67GiB ab5fade5cf45: 104.34% 3.01GiB / 15.67GiB ab5fade5cf45: 100.48% 3.075GiB / 15.67GiB ab5fade5cf45: 111.40% 2.187GiB / 15.67GiB
To extract RAM, e.g. for plotting in gnuplot, I use this:
awk '{m=substr($3, 0, length($3)-3); u=substr($3, length($3)-2); if("MiB"==u) M=m; else if("GiB"==u) M=1024*m; print M, m, u}' stats-training.txt > out
or this to include CPU usage:
awk '{cpu=substr($2,0,length($2)-1); m=substr($3, 0, length($3)-3); u=substr($3, length($3)-2); if("MiB"==u) M=m; else if("GiB"==u) M=1024*m; print cpu, M, m, u}' stats-training.txt > out
which gives easy to plot results:
547. 547. MiB 547. 547. MiB 547. 547. MiB 1044.48 1.02 GiB 3072 3.0 GiB 3143.68 3.07 GiB 2232.32 2.18 GiB
e.g. in gnuplot:
plot 'out' u 0:2 title "RAM", '' u 0:1 title "CPU" set xlabel 'time (s)' set ylabel 'docker RAM (MiB)'
PS. To add time-stamps, we could use the ts
command line tool (from the moreutils package) to prepend a timestamp to every line:
while true; do docker stats --no-stream --format "{{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" | /usr/bin/ts "%Y-%m-%dT%H:%M:%S%z" | tee --append docker-stats.txt; sleep 1; done
3. Example dockerfile
FROM python:3.7.3 ## every run-command creates a new layer, i.e. use "&&" to execute multiple commands as single line/layer RUN apt-get update && apt-get install foobar # NOTE: this should be done before copying source code, or else, any # change in source code will invalidate the cache, and all subsequent # steps are re-done, so it would re-install all dependencies after # each change in code or any other file COPY ./requirements.txt /tmp/requirements.txt RUN pip install -qr /tmp/requirements.txt # now copy everything in (buildcontext) folder recursively, to /src: # (not possible to reference paths outside of passed build folder)... COPY . /src # ...thus this is the same as above: # COPY / /src # this is where we do stuff, rel. CMD/RUN and other tings. WORKDIR /src # like COPY, but can get remote files, ADD http://foo.bar.txt /opt/ # and will auto-unzip zip and tar local files ADD foo.zip /opt/ CMD python /myfile.py # setting variable (don't use "RUN export..." as that will only be for # that build container and then be discarded) ENV MY_VAR # e.g. common way to change configuration parameters for containers: # docker run -e MY_VAR=8000
4. Update docker when changes are made
See this stackoverflow post
5. Debug
5.1. Container build fail
Each RUN
adds a layer to the image filesystem. ID is visible in the build
process below as 38460fb35297
for the last step:
Step 1/25 : FROM python:3.11-slim-bullseye ---> 4a12c61ccc71 Step 2/25 : VOLUME /mnt/input ---> Using cache ---> 38460fb35297
You can start an image from either of the above two IDs.
docker run --rm 38460fb35297 cat /tmp/foo.txt
Or run as interactive shell:
docker run --rm -it 38460fb35297 sh
For a failed build, use the ID of the preceding layer, and re-run the command that failed.
docker run --rm -it 4a12c61ccc71 bash -il
Above from so.
5.2. Running container
List running containers, to get container id
(left most column):
docker container ls
Enter into the running container
docker exec -it <container id> bash
6. Image size / disk space
We can quickly run out of disk space if we are not aware of what we are doing when working with docker images.
- Consider size of the base image used. E.g. CentOS: 231 MB, Debian: 124 MB, Ubuntu 74 MB, Alpine 6 MB, as noted.
Note, different distributions have different design philosophy, Ubuntu has long term enterprise support, Alpine is lightweight, security-focused, and based on musl-libc instead of glibc.
- Each
COPY
,ADD
, andRUN
, command might add files unexpectedly. See dockerignore (see below). - Analyze the image and layers using the tool dive, or save it out as an archive (see below).
- When installing packages, only install minimal required set, e.g.
apt-get -y --no-install-recommends
, no optional recommended packages.npm install --production
, no install development dependencies.
- Similarly, doing package updates (
apt-get update
) results in old packages being cached, thus we can addrm -rf /var/lib/apt/lists/*
. Note: Has to be the same layer for it to actually save disk space. Each layer is read only. The following will produce duplicates:
COPY somefile.txt . RUN chmod 777 somefile.txt
Where as the following will not:
COPY --chmod=777 somefile.txt .
- Thus, issuing a `RUN rm -rf somefile.txt` will still persist in the previous layers.
- Can do multi-stage builds, using multiple
FROM
statements, see last section here.
6.1. Dockerignore
In .dockerignore
we put files that should be ignored by docker, e.g.
commands like COPY
. I.e. changes to these files will not require a
rebuild, and will not be included in the image. As a rule of thumb: don’t
include large files in the docker image. If it needs to process e.g. large
data files, let a local folder on the host machine be mounted into the
docker image as a internal folder.
Importantly: we can ignore everything we don’t want the docker image to include. Below code snippet does an “inverse”, i.e. ignore everything, then specify exceptions:
# Exclude all files * # Include only these files & directories !README.md !pyproject.toml !poetry.lock !src/ !docs/ !Dockerfile.* # Exclude documentation docs/ # Exclude all cache-noise **/__pycache__/
6.2. Change disk location of built images
I have a separate partition for the OS /
path, where also
/var/lib/docker/
resides. Thus built images hog space on a partition only
intended for the OS files. This becomes especially problematic for large
docker images, for instance for LLM modes.
Solution: Alternative to constantly have to prune & purge disk, is change
the path to a different partition. In following example, I have a separate
partition on /data/
.
Create new destination for images:
sudo mkdir /data/docker
Set new destination in /etc/docker/daemon.json
(create it if it does not exist):
{ "data-root": "/data/docker" }
Restart:
sudo systemctl daemon-reload sudo systemctl restart docker
src: stackoverflow
6.3. Clear disk space — Go nuclear!
Running docker eats a lot of disk space, especially when experimenting / starting out.
We could do the manual/hard way when docker images run out of disk space (if not changed the path to where docker images are saved):
sudo rm -f /var/lib/docker sudo systemctl stop docker.service sudo systemctl start docker.service
However, it is preferred to use the built in commands below:
docker system prune -a docker images -a docker rmi <Image> <Image> docker images purge docker volume prune
For more, read: How To Remove Docker Images, Containers, and Volumes
6.4. Save docker image to tar
Useful if we want to manually inspect exactly what is bundled in the image.
docker save <image-digest> -o image.tar
tar -xf image.tar -C image
cd image
tar -xf <layer-digest>/layer.tar
6.5. docker-phobia
This program (written in go) analyzes docker images, to help the user make them slim. It provides a visual representation of size of each file in a docker image: https://github.com/remorses/docker-phobia
7. Docker compose
Docker compose excels for orchestrating multiple docker images. E.g. if they need to share network (e.g. one frontend image serving a web page, one backend running an API server), or share disks, or simply starting many images, possibly depending on order of start or finish.
Below is an example for a compose.yaml
that defines downloading an image
serving a Clip LLM model, clip, being used by my web app app (as a
convention, we have each service consistently use port 8000 internally)
services: app: env_file: - ./.env build: context: . dockerfile: Dockerfile.api # Set user id : group id user: 1000:1000 ports: - "8501:8000" volumes: - ${DATA_FOLDER}:/mnt/input environment: CLIP_URL: "grpc://clip" CLIP_PORT: "51000" # Need to shadow local DATA_FOLDER defined in .env: DATA_FOLDER: "/mnt/input" restart: unless-stopped depends_on: - clip clip: image: "jinaai/clip-as-service" restart: unless-stopped ports: - "51000:8000"
The dockerfile for the app using streamlit is:
FROM python:3.8-slim-bullseye # Define some environment variables ENV PIP_NO_CACHE_DIR=true \ DEBIAN_FRONTEND=noninteractive # Install dependencies needed to download/install packages RUN apt-get update \ && apt-get install -y --no-install-recommends \ apt-utils \ curl # Upgrade system-wide pip/setuptools RUN pip install --upgrade pip setuptools # We want to run things as a non-privileged user ENV USERNAME=user ENV PATH="$PATH:/home/$USERNAME/.local/bin:/home/$USERNAME/app/.venv/bin" # Add user and set up a workdir RUN useradd -m $USERNAME -u 12345 WORKDIR /home/$USERNAME/app RUN chown $USERNAME.$USERNAME . # Everything below here runs as a non-privileged user USER $USERNAME # Install poetry RUN curl -sSL https://install.python-poetry.org | python - --version 1.6.1 RUN poetry config virtualenvs.in-project true # Install runtime dependencies (will be cached) COPY pyproject.toml poetry.lock ./ RUN poetry install --without dev --no-root # # Application specific configuration below here # # Copy project files to container COPY . . # Install our own package RUN poetry install --only-root # Environment variables that are Docker-specific ENV DATA_FOLDER=/mnt/input # Create some files to indicate the source of the build, in case tag is not available ARG GIT_BRANCH GIT_COMMIT RUN echo "$GIT_BRANCH" > BRANCH_SOURCE && \ echo "$GIT_COMMIT" > COMMIT_SOURCE # Set up a healthcheck HEALTHCHECK CMD curl --fail http://localhost:8000/_stcore/health # For documentation only, we expose the port that Streamlit listens on: EXPOSE 8000 # Run this command CMD ["streamlit", "run", "streamlit/Home.py", "--server.headless", "t", "--server.port", "8000"]
8. Links / external resources
- To deal with permission-problems when mounting the local file system into
the docker image, that then writes to it:
- Run with correct user id
- Running Docker Containers as Current Host User (thorough write up of the problem)
- Running docker on Ubuntu: mounted host volume is not writable from container
should be supported since version 1.4 to use the
:rw
flag in the docker-compose file:“When mounting volumes with the volumes option, you can now pass in any mode supported by the daemon, not just :ro or :rw. For example, SELinux users can pass :z or :Z.”
- Connection refused? Docker networking and how it impacts your image Networking in Docker 101. Having trouble getting Docker to access a server that works when running locally? Or having docker images communicating? Read this.
- https://github.com/wagoodman/dive An excellent open-source tool to visualize and analyze local Docker images.
- Build Docker image using github actions (also this general compilation of GH actions: awesome-actions, and use ssh to debug GH actions)
- Dockerfile best practices (recent)
- Best practices for writing Dockerfiles
- Guidance for Docker Image Authors Good, especially on CMD vs. ENTRYPOINT; and volumes
- How are docker images built? A look into the Linux overlay file-systems and the OCI specification (I’ve not read this)
- Dockerfile Best Practices (2014)
- https://contains.dev/ Our contains.dev offers many tools to analyze layers, their contents, and their size. Including navigating a treemap of your image