1

Question Description:

I have a harvester HCI Cluster (RKE2), where pods do not resolve the correct IP addresses for internet domains.

kubectl run debug --image=busybox -i --tty --rm -- sh

/ # ping serverfault.com PING serverfault.com (<redacted IP address>): 56 data bytes 64 bytes from <redacted IP address>: seq=0 ttl=63 time=0.362 ms 64 bytes from <redacted IP address>: seq=1 ttl=63 time=0.312 ms 64 bytes from <redacted IP address>: seq=2 ttl=63 time=0.319 ms 64 bytes from <redacted IP address>: seq=3 ttl=63 time=0.449 ms 64 bytes from <redacted IP address>: seq=4 ttl=63 time=0.317 ms 64 bytes from <redacted IP address>: seq=5 ttl=63 time=0.363 ms 64 bytes from <redacted IP address>: seq=6 ttl=63 time=0.296 ms 64 bytes from <redacted IP address>: seq=7 ttl=63 time=0.361 ms ^C --- serverfault.com ping statistics --- 8 packets transmitted, 8 packets received, 0% packet loss round-trip min/avg/max = 0.296/0.347/0.449 ms

<redacted IP address> in this case happens to be the public IP address of the network in which the cluster resides in (and not one of serverfault.coms IP addresses).

However within the same container, nslookup does list the correct IP address:

/ # nslookup serverfault.com
Server:     10.53.0.10
Address:    10.53.0.10:53

Non-authoritative answer: Name: serverfault.com Address: 104.18.23.101 Name: serverfault.com Address: 104.18.22.101

Non-authoritative answer:

This is not reproducible on the host node:

# ping serverfault.com
PING serverfault.com (104.18.23.101) 56(84) bytes of data.
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=1 ttl=57 time=1.27 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=2 ttl=57 time=1.30 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=3 ttl=57 time=1.33 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=4 ttl=57 time=1.29 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=5 ttl=57 time=1.23 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=6 ttl=57 time=1.28 ms
^C
--- serverfault.com ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5006ms
rtt min/avg/max/mdev = 1.231/1.284/1.333/0.030 ms

The cluster itself is a fresh installation of Harvester HCI v1.2.0 with no additional configuration changes post-installation.

I am looking for further tips on how to troubleshoot this issue and find out why its resolving the wrong IP address.


Context:

/etc/resolve.conf on host:

### /etc/resolv.conf is a symlink to /var/run/netconfig/resolv.conf
### autogenerated by netconfig!

search harvester.<redacted domain> 1 nameserver 10.10.0.1

/etc/resolve.conf on pod container:

search default.svc.cluster.local svc.cluster.local cluster.local harvester.<redacted domain>
nameserver 10.53.0.10
options ndots:5

/etc/nsswitch.conf on host:

#
# /etc/nsswitch.conf
#

passwd: compat group: compat shadow: compat

Allow initgroups to default to the setting for group.

initgroups: compat

hosts: files mdns_minimal [NOTFOUND=return] dns networks: files dns

aliases: files usrfiles ethers: files usrfiles gshadow: files usrfiles netgroup: files nis protocols: files usrfiles publickey: files rpc: files usrfiles services: files usrfiles

automount: files nis bootparams: files netmasks: files

/etc/nsswitch.conf on pod container:

# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.

passwd: files group: files shadow: files gshadow: files

hosts: files dns networks: files

protocols: db files services: db files ethers: db files rpc: db files

netgroup: nis

/etc/hosts in both cases contain no additional/suspicious entries.

1 Answers1

0

I found the issue to be with the ndots option in resolve.conf:

options ndots:5

This option means that only if the hostname contains 5 or more dots it will not be appended to the search domain.

I suspect that this option is needed, because kubernetes internally uses a lot of hostnames with multiple dots.

However serverfault.com for example only has one dot, so I get appended to the local domain harvester.<redacted domain> making it serverfault.com.harvester.<redacted domain>. We happened to have a wildcard (*) record on that domain that pointed to the public IP address of the network. As a result serverfault.com.harvester.<redacted domain> would be resolved with the wildcard record, explaining that behaviour.

To fix this, we have temporarily removed DHCP record for the local domain. As a result, the search configuration in result.conf would no longer include it and thus internet domain would no longer appended to the local domain.

In the long we plan to remove the wildcard domain.