UP | HOME
Impaktor

Impaktor

Inject Emacs intravenously

On docker — running, monitoring, administering, and cleaning images and containers
Published on Sep 10, 2020 by Impaktor.

The greatest thing in the world is to know how to belong to oneself

Michel de Montaigne, The Complete Essays

Table of Contents

Introduction

These are my notes on Docker, compiled from a tutorial, arch wiki, Jérôme Petazzoni’s PyCon 2016 talk and possibly other places.

See also this post on VMs vs Docker images on QEMU MicroVMs.

Definition: “Images” & “Containers”

One can have multiple containers from a single image

  • images are conceptually similar to classes
  • layers are conceptually similar to inheritance
  • containers are conceptually similar to instances

Getting started

Installing docker

start it manually for now:

sudo systemctl start docker.service

…or add it for each startup:

sudo systemctl enable docker.service

check your groups

groups

add yourself to docker group

gpasswd -a <user> docker

test that you can run docker as normal user:

docker info
docker version

Basic docker commands

Images are by default stored in: /var/lib/docker, but can be changed see further down.

To search available images to download:

docker search <image name>

e.g. install an arch docker:

docker pull archlinux/base

Run interactive (-i) and in terminal (-t):

docker run -i -t ubuntu

installs the docker image and drops you in the docker image (exit to leave)

In general:

docker run [image name] [command to run]

shut down running docker image

docker stop [container ID]

or more brütal:

docker kill
docker attach [container ID]

remove docker image

docker rm [container ID]

Setting up a docker image

  • Doing it manually / interactive

    Start a docker image:

    docker run -it ubuntu
    

    now in said docker image:

    apt-get update
    apt-get install <needed packages>
    exit
    

    show diff from base image:

    docker diff <container ID>
    

    commit container:

    docker commit <container ID>
    docker commit <container ID> <desired name of image>
    

    which returns an sha256 hexadecimal string. NOTE: commit only saves state of file system, not running processes.

    Launch it, and check that it has the installed files

    docker run -ti <returned image ID>
    

    or tag it to alias:

    docker tag <returned image ID> <desired name of image>
    
  • Doing it using Dockerfiles

    Instead of using commit to create a docker container, we can use the build command, which takes a docker-file that automates what we did manually above. Create file Dockerfile (default name it looks for) in folder (optional name) with content:

    FROM ubuntu
    RUN apt-get update
    RUN apt-get install figlet
    

    execute, new image with tag “Figlet”

    docker build -t Figlet <path to folder with dockerfile>
    

    good to have:

    docker history Figlet
    

    run figlet command from figlet container:

    docker run figlet figlet hello
    

    can add the command directly in the container file:

    FROM ubuntu
    RUN apt-get update
    RUN apt-get install figlet
    CMD figlet hello
    

    and will only then need:

    docker run figlet
    

    if I still want to get into the container:

    docker run -ti figlet bash
    

    Using ENTRYPOINT allows passing arguments to docker, so the container behaves like a binary file, or script that can take arguments. If using ENTRYPOINT and CMD together, both must use json syntax

    (Taken from https://www.youtube.com/watch?v=ZVaRK10HBjo&t=1h45m)

Naming containers

Docker gives names by default, see docker ps -a or we can give a name

docker run --name ticktock python

(or docker rename) and then use:

docker logs ticktock
docker stop ticktock
docker inspect ticktock
docker inspect --format '{{ .State.Running}}' ticktock

(This was taken from yt: Introduction to Docker and containers - PyCon 2016)

Monitoring

Show running docker processes:

docker ps

Show last, show only id (useful for scripting)

docker ps -l -q

Show also (all) non-running:

docker ps -a

Show status (CPU/RAM, etc) on running images

docker stats
docker stats --no-stream
docker stats <list container IDs>

Or also include stopped containers

docker stats --all

Common usage for “logs” (shows output of container, e.g. if running daemonized with run -d, or to publish all ports exposed in the image: run -P)

docker logs --tail 10 --follow <container ID>

There are three networks by default:

docker network ls

e.g.

docker network inspect bridge

“The bridge network corresponds to the docker0 network, which is present in all Docker installations. The none network doesn’t have any access to the external network, but it can be used for running batch jobs. Finally, the host network adds a container on the host’s network stack without any isolation between the host machine and the container.”

From linuxhint. Note, if you need two docker containers to communicate with each other, or other connectivity issues with docker containers, see the article Connection refused? Docker networking and how it impacts your image

Often one might develop a docker image to run on a VM. Then we need to know how much RAM to configure the VM for. To monitor RAM usage, to file, of a running docker image, for plotting later on:

while true; do docker stats --no-stream --format "{{.Container}}:\t{{.CPUPerc}}\t{{.MemUsage}}" | tee --append stats-training.txt; sleep 1; done

Gives output:

ab5fade5cf45:	0.05%	547.5MiB / 15.67GiB
ab5fade5cf45:	0.01%	547.5MiB / 15.67GiB
ab5fade5cf45:	0.01%	547.5MiB / 15.67GiB
ab5fade5cf45:	264.85%	1.023GiB / 15.67GiB
ab5fade5cf45:	104.34%	3.01GiB / 15.67GiB
ab5fade5cf45:	100.48%	3.075GiB / 15.67GiB
ab5fade5cf45:	111.40%	2.187GiB / 15.67GiB

To extract RAM, e.g. for plotting in gnuplot, I use this:

awk '{m=substr($3, 0, length($3)-3); u=substr($3, length($3)-2); if("MiB"==u) M=m; else if("GiB"==u) M=1024*m; print M, m, u}' stats-training.txt > out

or this to include CPU usage:

awk '{cpu=substr($2,0,length($2)-1); m=substr($3, 0, length($3)-3); u=substr($3, length($3)-2); if("MiB"==u) M=m; else if("GiB"==u) M=1024*m; print cpu, M, m, u}' stats-training.txt > out

which gives easy to plot results:

547. 547. MiB
547. 547. MiB
547. 547. MiB
1044.48 1.02 GiB
3072 3.0 GiB
3143.68 3.07 GiB
2232.32 2.18 GiB

e.g. in gnuplot:

plot 'out' u 0:2 title "RAM", '' u 0:1 title "CPU"
set xlabel 'time (s)'
set ylabel 'docker RAM (MiB)'

PS. To add time-stamps, we could use the ts command line tool (from the moreutils package) to prepend a timestamp to every line:

while true; do docker stats --no-stream --format "{{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" | /usr/bin/ts "%Y-%m-%dT%H:%M:%S%z" | tee --append docker-stats.txt; sleep 1; done

Example dockerfile

FROM python:3.7.3

## every run-command creates a new layer, i.e. use "&&" to execute multiple commands as single line/layer
RUN apt-get update && apt-get install foobar

# NOTE: this should be done before copying source code, or else, any
# change in source code will invalidate the cache, and all subsequent
# steps are re-done, so it would re-install all dependencies after
# each change in code or any other file
COPY ./requirements.txt /tmp/requirements.txt
RUN pip install -qr /tmp/requirements.txt

# now copy everything in (buildcontext) folder recursively, to /src:
# (not possible to reference paths outside of passed build folder)...
COPY . /src
# ...thus this is the same as above:
# COPY / /src

# this is where we do stuff, rel. CMD/RUN and other tings.
WORKDIR /src

# like COPY, but can get remote files,
ADD http://foo.bar.txt /opt/
#  and will auto-unzip zip and tar local files
ADD foo.zip /opt/

CMD python /myfile.py

# setting variable (don't use "RUN export..." as that will only be for
# that build container and then be discarded)
ENV MY_VAR

# e.g. common way to change configuration parameters for containers:
# docker run -e MY_VAR=8000

Update docker when changes are made

See this stackoverflow post

Debug

Container build fail

Each RUN adds a layer to the image filesystem. ID is visible in the build process below as 38460fb35297 for the last step:

Step 1/25 : FROM python:3.11-slim-bullseye
---> 4a12c61ccc71
Step 2/25 : VOLUME /mnt/input
---> Using cache
---> 38460fb35297

You can start an image from either of the above two IDs.

docker run --rm 38460fb35297 cat /tmp/foo.txt

Or run as interactive shell:

docker run --rm -it 38460fb35297 sh

For a failed build, use the ID of the preceding layer, and re-run the command that failed.

docker run --rm -it 4a12c61ccc71 bash -il

Above from so.

Running container

List running containers, to get container id (left most column):

docker container ls

Enter into the running container

docker exec -it <container id> bash

Image size / disk space

We can quickly run out of disk space if we are not aware of what we are doing when working with docker images.

  • Consider size of the base image used. E.g. CentOS: 231 MB, Debian: 124 MB, Ubuntu 74 MB, Alpine 6 MB, as noted.

Note, different distributions have different design philosophy, Ubuntu has long term enterprise support, Alpine is lightweight, security-focused, and based on musl-libc instead of glibc.

  • Each COPY, ADD, and RUN, command might add files unexpectedly. See dockerignore (see below).
  • Analyze the image and layers using the tool dive, or save it out as an archive (see below).
  • When installing packages, only install minimal required set, e.g.
    • apt-get -y --no-install-recommends, no optional recommended packages.
    • npm install --production, no install development dependencies.
  • Similarly, doing package updates (apt-get update) results in old packages being cached, thus we can add rm -rf /var/lib/apt/lists/*. Note: Has to be the same layer for it to actually save disk space.
  • Each layer is read only. The following will produce duplicates:

    COPY somefile.txt .
    RUN chmod 777 somefile.txt
    

    Where as the following will not:

    COPY --chmod=777 somefile.txt .
    
  • Thus, issuing a `RUN rm -rf somefile.txt` will still persist in the previous layers.
  • Can do multi-stage builds, using multiple FROM statements, see last section here.

Dockerignore

In .dockerignore we put files that should be ignored by docker, e.g. commands like COPY. I.e. changes to these files will not require a rebuild, and will not be included in the image. As a rule of thumb: don’t include large files in the docker image. If it needs to process e.g. large data files, let a local folder on the host machine be mounted into the docker image as a internal folder.

Importantly: we can ignore everything we don’t want the docker image to include. Below code snippet does an “inverse”, i.e. ignore everything, then specify exceptions:

# Exclude all files
*

# Include only these files & directories
!README.md
!pyproject.toml
!poetry.lock
!src/
!docs/
!Dockerfile.*

# Exclude documentation
docs/

# Exclude all cache-noise
**/__pycache__/

Change disk location of built images

I have a separate partition for the OS / path, where also /var/lib/docker/ resides. Thus built images hog space on a partition only intended for the OS files. This becomes especially problematic for large docker images, for instance for LLM modes.

Solution: Alternative to constantly have to prune & purge disk, is change the path to a different partition. In following example, I have a separate partition on /data/.

Create new destination for images:

sudo mkdir /data/docker

Set new destination in /etc/docker/daemon.json (create it if it does not exist):

{
    "data-root": "/data/docker"
}

Restart:

sudo systemctl daemon-reload
sudo systemctl restart docker

src: stackoverflow

Clear disk space — Go nuclear!

Running docker eats a lot of disk space, especially when experimenting / starting out.

We could do the manual/hard way when docker images run out of disk space (if not changed the path to where docker images are saved):

sudo rm -f /var/lib/docker
sudo systemctl stop docker.service
sudo systemctl start docker.service

However, it is preferred to use the built in commands below:

docker system prune -a
docker images -a
docker rmi <Image> <Image>
docker images purge
docker volume prune

For more, read: How To Remove Docker Images, Containers, and Volumes

Save docker image to tar

Useful if we want to manually inspect exactly what is bundled in the image.

docker save <image-digest> -o image.tar
tar -xf image.tar -C image
cd image
tar -xf <layer-digest>/layer.tar

Optimizing Docker image size and why it matters

Docker compose

Docker compose excels for orchestrating multiple docker images. E.g. if they need to share network (e.g. one frontend image serving a web page, one backend running an API server), or share disks, or simply starting many images, possibly depending on order of start or finish.

Below is an example for a compose.yaml that defines downloading an image serving a Clip LLM model, clip, being used by my web app app (as a convention, we have each service consistently use port 8000 internally)

services:
  app:
    env_file:
     - ./.env
    build:
	context: .
	dockerfile: Dockerfile.api
    # Set user id : group id
    user: 1000:1000
    ports:
      - "8501:8000"
    volumes:
      - ${DATA_FOLDER}:/mnt/input
    environment:
	CLIP_URL: "grpc://clip"
	CLIP_PORT: "51000"
	# Need to shadow local DATA_FOLDER defined in .env:
	DATA_FOLDER: "/mnt/input"
    restart: unless-stopped
    depends_on:
	- clip
  clip:
    image: "jinaai/clip-as-service"
    restart: unless-stopped
    ports:
	- "51000:8000"

The dockerfile for the app using streamlit is:

FROM python:3.8-slim-bullseye

# Define some environment variables
ENV PIP_NO_CACHE_DIR=true \
    DEBIAN_FRONTEND=noninteractive

# Install dependencies needed to download/install packages
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
    apt-utils \
    curl

# Upgrade system-wide pip/setuptools
RUN pip install --upgrade pip setuptools

# We want to run things as a non-privileged user
ENV USERNAME=user
ENV PATH="$PATH:/home/$USERNAME/.local/bin:/home/$USERNAME/app/.venv/bin"

# Add user and set up a workdir
RUN useradd -m $USERNAME -u 12345
WORKDIR /home/$USERNAME/app
RUN chown $USERNAME.$USERNAME .

# Everything below here runs as a non-privileged user
USER $USERNAME

# Install poetry
RUN curl -sSL https://install.python-poetry.org | python - --version 1.6.1
RUN poetry config virtualenvs.in-project true

# Install runtime dependencies (will be cached)
COPY pyproject.toml poetry.lock ./
RUN poetry install --without dev --no-root

#
# Application specific configuration below here
#

# Copy project files to container
COPY . .

# Install our own package
RUN poetry install --only-root

# Environment variables that are Docker-specific
ENV DATA_FOLDER=/mnt/input

# Create some files to indicate the source of the build, in case tag is not available
ARG GIT_BRANCH GIT_COMMIT
RUN echo "$GIT_BRANCH" > BRANCH_SOURCE && \
    echo "$GIT_COMMIT" > COMMIT_SOURCE

# Set up a healthcheck
HEALTHCHECK CMD curl --fail http://localhost:8000/_stcore/health

# For documentation only, we expose the port that Streamlit listens on:
EXPOSE 8000

# Run this command
CMD ["streamlit",  "run", "streamlit/Home.py", "--server.headless", "t", "--server.port", "8000"]

Links / external resources