4

My Docker Daemon seems to ignore /etc/docker/daemon.json on boot.

Similar to this question, I'm having some troubles telling the Docker daemon that it should not use the default 172.17.* range. That range is already claimed by our VPN and prevents people connected through that VPN from making a connection to the server Docker runs on.

The hugely annoying thing is that every time I reboot my server, Docker claims an IP from the VPN's range again, regardless of what I put in /etc/docker/daemon.json. I have to manually issue

# systemctl restart docker

directly after boot before people on the 172.17.* network can reach the server again.

This obviously gets forgotten quite often and leads to many problem tickets.

My /etc/docker/daemon.json looks like this:

{
 "default-address-pools": [
   {
      "base": "172.20.0.0/16",
      "size": 24
   }
 ]
}

and is permissioned like so:

-rw-r--r--   1 root root   123 Dec  8 10:43 daemon.json

I have no idea how to even start diagnosing this problem; any ideas?

For completeness:

  • Ubuntu 18.04.5 LTS
  • Docker version 19.03.6, build 369ce74a3c

EDIT: output of systemctl cat docker:

# /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket
Wants=containerd.service

[Service] Type=notify

the default is not to use systemd for cgroups because the delegate issues still

exists and systemd currently does not support the cgroup feature set required

for containers run by docker

ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock ExecReload=/bin/kill -s HUP $MAINPID TimeoutSec=0 RestartSec=2 Restart=always

Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.

Both the old, and new location are accepted by systemd 229 and up, so using the old location

to make them work for either version of systemd.

StartLimitBurst=3

Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.

Both the old, and new name are accepted by systemd 230 and up, so using the old name to make

this option work for either version of systemd.

StartLimitInterval=60s

Having non-zero Limit*s causes performance problems due to accounting overhead

in the kernel. We recommend using cgroups to do container-local accounting.

LimitNOFILE=infinity LimitNPROC=infinity LimitCORE=infinity

Comment TasksMax if your systemd version does not support it.

Only systemd 226 and above support this option.

TasksMax=infinity

set delegate yes so that systemd does not reset the cgroups of docker containers

Delegate=yes

kill only the docker process, not all processes in the cgroup

KillMode=process

[Install] WantedBy=multi-user.target

Output of sudo docker info (after systemctl restart docker):

Client:
 Debug Mode: false

Server: Containers: 34 Running: 19 Paused: 0 Stopped: 15 Images: 589 Server Version: 19.03.6 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: runc version: init version: Security Options: apparmor seccomp Profile: default Kernel Version: 4.15.0-140-generic Operating System: Ubuntu 18.04.5 LTS OSType: linux Architecture: x86_64 CPUs: 16 Total Memory: 47.16GiB Name: linuxsrv ID: <redacted> Docker Root Dir: /var/lib/docker Debug Mode: false Username: <redacted> Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Registry Mirrors: http://172.16.30.33:6000/ Live Restore Enabled: false

WARNING: No swap limit support

2 Answers2

2

Although I thought I resolved the problem using BMitch's answer, I was wrong - the docker0 address was still in the wrong 172.17.*.* range after boot.

After a lot more digging, it turned out that, somehow, I had multiple versions of dockerd installed:

  1. the one you get if you install as per the docs
  2. ...the one installed via Snap ‍♂️

Apparently, the one from Snap was the one started at boot, while the other one was the one started by running sudo systemctl restart docker.

Uninstalling & purging the one from Snap that escaped (...evaded?) my attention finally solved this pesky problem.

1

There are multiple address pools used by docker. The default-address-pools applies to all new user created bridge networks. It's possible you'll need to delete and recreate those networks after changing this setting.

There's also bip, set in the daemon.json file with a line like:

"bip": "192.168.63.1/24"

The bip setting applies to the default bridge network named bridge and needs to be set to the CIDR for the gateway on that bridge network (so you can't define it to 192.168.63.0/24, the trailing .1 was important).

And if you are using swarm mode, overlay networks have their own address pools shared across nodes in the overlay network. That needs to be configured during docker swarm init with the --default-addr-pool flag.

Lastly if you are running docker via snap, the location of this file is /var/snap/docker/current/etc/docker/daemon.json and it doesn't appear that is preserved across updates, so you'll need to replace this file again after an update.

BMitch
  • 6,769