Kubernetes cluster with incorrect DNS resolution

Question

Question Description:

I have a harvester HCI Cluster (RKE2), where pods do not resolve the correct IP addresses for internet domains.

kubectl run debug --image=busybox -i --tty --rm -- sh
/ # ping serverfault.com
PING serverfault.com (<redacted IP address>): 56 data bytes
64 bytes from <redacted IP address>: seq=0 ttl=63 time=0.362 ms
64 bytes from <redacted IP address>: seq=1 ttl=63 time=0.312 ms
64 bytes from <redacted IP address>: seq=2 ttl=63 time=0.319 ms
64 bytes from <redacted IP address>: seq=3 ttl=63 time=0.449 ms
64 bytes from <redacted IP address>: seq=4 ttl=63 time=0.317 ms
64 bytes from <redacted IP address>: seq=5 ttl=63 time=0.363 ms
64 bytes from <redacted IP address>: seq=6 ttl=63 time=0.296 ms
64 bytes from <redacted IP address>: seq=7 ttl=63 time=0.361 ms
^C
--- serverfault.com ping statistics ---
8 packets transmitted, 8 packets received, 0% packet loss
round-trip min/avg/max = 0.296/0.347/0.449 ms

<redacted IP address> in this case happens to be the public IP address of the network in which the cluster resides in (and not one of serverfault.coms IP addresses).

However within the same container, nslookup does list the correct IP address:

/ # nslookup serverfault.com
Server:     10.53.0.10
Address:    10.53.0.10:53
Non-authoritative answer:
Name:   serverfault.com
Address: 104.18.23.101
Name:   serverfault.com
Address: 104.18.22.101
Non-authoritative answer:

This is not reproducible on the host node:

# ping serverfault.com
PING serverfault.com (104.18.23.101) 56(84) bytes of data.
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=1 ttl=57 time=1.27 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=2 ttl=57 time=1.30 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=3 ttl=57 time=1.33 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=4 ttl=57 time=1.29 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=5 ttl=57 time=1.23 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=6 ttl=57 time=1.28 ms
^C
--- serverfault.com ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5006ms
rtt min/avg/max/mdev = 1.231/1.284/1.333/0.030 ms

The cluster itself is a fresh installation of Harvester HCI v1.2.0 with no additional configuration changes post-installation.

I am looking for further tips on how to troubleshoot this issue and find out why its resolving the wrong IP address.

Context:

/etc/resolve.conf on host:

### /etc/resolv.conf is a symlink to /var/run/netconfig/resolv.conf
### autogenerated by netconfig!
search harvester.<redacted domain> 1
nameserver 10.10.0.1

/etc/resolve.conf on pod container:

search default.svc.cluster.local svc.cluster.local cluster.local harvester.<redacted domain>
nameserver 10.53.0.10
options ndots:5

/etc/nsswitch.conf on host:

#
# /etc/nsswitch.conf
#
passwd:     compat
group:      compat
shadow:     compat
Allow initgroups to default to the setting for group.
initgroups:   compat
hosts:      files mdns_minimal [NOTFOUND=return] dns
networks:   files dns
aliases:    files usrfiles
ethers:     files usrfiles
gshadow:    files usrfiles
netgroup:   files nis
protocols:  files usrfiles
publickey:  files
rpc:        files usrfiles
services:   files usrfiles
automount:  files nis
bootparams: files
netmasks:   files

/etc/nsswitch.conf on pod container:

# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd:         files
group:          files
shadow:         files
gshadow:        files
hosts:          files dns
networks:       files
protocols:      db files
services:       db files
ethers:         db files
rpc:            db files
netgroup:       nis

/etc/hosts in both cases contain no additional/suspicious entries.

score 0 · Answer 1 · answered Oct 25 '23 at 08:04

I found the issue to be with the ndots option in resolve.conf:

options ndots:5

This option means that only if the hostname contains 5 or more dots it will not be appended to the search domain.

I suspect that this option is needed, because kubernetes internally uses a lot of hostnames with multiple dots.

However serverfault.com for example only has one dot, so I get appended to the local domain harvester.<redacted domain> making it serverfault.com.harvester.<redacted domain>. We happened to have a wildcard (*) record on that domain that pointed to the public IP address of the network. As a result serverfault.com.harvester.<redacted domain> would be resolved with the wildcard record, explaining that behaviour.

To fix this, we have temporarily removed DHCP record for the local domain. As a result, the search configuration in result.conf would no longer include it and thus internet domain would no longer appended to the local domain.

In the long we plan to remove the wildcard domain.

Kubernetes cluster with incorrect DNS resolution

Question Description:

Context:

Allow initgroups to default to the setting for group.

initgroups: compat

1 Answers1