I have a local Kubernetes cluster setup using kubeadm. Everything was working fine a few days back but now I've got a strange problem.
Whenever I add new resources be it a deployment or pod, it never gets scheduled and is forever stuck in Pending state without any error. Please let me know if you anyone has any idea how to fix this issue or even debug for that matter.
I'm pasting output of some of the commands I think might be useful, let me know if you need any additional information
• kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
• kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
in-dt-dckr-ldr1 Ready control-plane 386d v1.25.2 10.64.1.98 <none> Ubuntu 20.04.5 LTS 5.15.0-58-generic containerd://1.6.8
in-dt-dckr-ldr2 Ready <none> 386d v1.25.2 10.64.1.190 <none> Ubuntu 20.04.5 LTS 5.15.0-58-generic containerd://1.6.8
in-dt-dckr-wrk1 Ready worker 386d v1.25.2 10.64.1.232 <none> Ubuntu 20.04.5 LTS 5.15.0-58-generic containerd://1.6.8
• kubectl describe node in-dt-dckr-wrk1
Name: in-dt-dckr-wrk1
Roles: worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=in-dt-dckr-wrk1
kubernetes.io/os=linux
node-role.kubernetes.io/worker=worker
Annotations: csi.volume.kubernetes.io/nodeid: {"csi.tigera.io":"in-dt-dckr-wrk1"}
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 10.64.1.232/22
projectcalico.org/IPv4VXLANTunnelAddr: 192.168.137.192
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 13 Oct 2022 01:04:07 +0530
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: in-dt-dckr-wrk1
AcquireTime: <unset>
RenewTime: Fri, 03 Nov 2023 08:57:17 +0530
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Sat, 21 Jan 2023 19:29:25 +0530 Sat, 21 Jan 2023 19:29:25 +0530 CalicoIsUp Calico is running on this node
MemoryPressure False Fri, 03 Nov 2023 08:53:02 +0530 Sat, 21 Jan 2023 19:28:44 +0530 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 03 Nov 2023 08:53:02 +0530 Sat, 21 Jan 2023 19:28:44 +0530 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 03 Nov 2023 08:53:02 +0530 Sat, 21 Jan 2023 19:28:44 +0530 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 03 Nov 2023 08:53:02 +0530 Sat, 21 Jan 2023 19:28:44 +0530 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.64.1.232
Hostname: in-dt-dckr-wrk1
Capacity:
cpu: 8
ephemeral-storage: 959786032Ki
hugepages-2Mi: 0
memory: 8023576Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 884538805627
hugepages-2Mi: 0
memory: 7921176Ki
pods: 110
System Info:
Machine ID: 9e7f87693fba4ce5b5cffd58e27c4b5e
System UUID: 4c4c4544-0032-5910-804e-b3c04f595831
Boot ID: 8c6f8a8c-0acc-4ac9-86df-36867df44d8f
Kernel Version: 5.15.0-58-generic
OS Image: Ubuntu 20.04.5 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.8
Kubelet Version: v1.25.2
Kube-Proxy Version: v1.25.2
PodCIDR: 192.168.1.0/24
PodCIDRs: 192.168.1.0/24
Non-terminated Pods: (18 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
buildninja bn-agent1-6f484b8894-qr88p 0 (0%) 0 (0%) 0 (0%) 0 (0%) 52d
buildninja bn-agent2-67b6859f58-zg6kc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 52d
calico-apiserver calico-apiserver-978f5784c-77gss 0 (0%) 0 (0%) 0 (0%) 0 (0%) 346d
calico-system calico-kube-controllers-6fdfb4dfdc-rfbsr 0 (0%) 0 (0%) 0 (0%) 0 (0%) 346d
calico-system calico-node-4gxvn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 386d
calico-system calico-typha-5cd5cbc888-mcqfh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 386d
calico-system csi-node-driver-zr5lc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 386d
db mongodb-cc47dc8b8-wwszq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 178d
db postgres-79f797b96c-vhswc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 285d
docker-registry gci-docker-registry-644799b4b7-q8wbl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 283d
kube-system coredns-5fcc5bdd47-b6w88 100m (1%) 0 (0%) 70Mi (0%) 170Mi (2%) 283d
kube-system kube-proxy-zjp4z 0 (0%) 0 (0%) 0 (0%) 0 (0%) 386d
kube-system metrics-server-5db9b4b966-8gqgk 100m (1%) 0 (0%) 200Mi (2%) 0 (0%) 386d
tigera-operator tigera-operator-6675dc47f4-z9jtx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 386d
utils web-sample-ff4d6596-hmtw6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 170d
utils wiki-server-86f49bd4c4-4drdr 0 (0%) 0 (0%) 0 (0%) 0 (0%) 182d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 200m (2%) 0 (0%)
memory 270Mi (3%) 170Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>
• kubectl describe po nginx
Name: nginx
Namespace: default
Priority: 0
Service Account: default
Node: <none>
Labels: run=nginx
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
nginx:
Image: nginx
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lg45r (ro)
Volumes:
kube-api-access-lg45r:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
I found a similar case but no resolutions there too https://stackoverflow.com/questions/55310076/pods-are-in-pending-state
Update 1 Adding output for journalctl
• sudo journalctl -u kubelet -f
-- Logs begin at Fri 2022-04-22 10:32:53 IST. --
Nov 01 14:18:45 IN-DT-DCKR-LDR1 kubelet[3279607]: I1101 14:18:45.802594 3279607 request.go:682] Waited for 1.137588349s due to client-side throttling, not priority and fairness, request: POST:https://10.64.1.98:6443/api/v1/namespaces/kube-system/pods
Nov 01 14:18:45 IN-DT-DCKR-LDR1 kubelet[3279607]: E1101 14:18:45.845718 3279607 kubelet.go:1712] "Failed creating a mirror pod for" err="pods \"etcd-in-dt-dckr-ldr1\" already exists" pod="kube-system/etcd-in-dt-dckr-ldr1"
Nov 01 14:18:46 IN-DT-DCKR-LDR1 kubelet[3279607]: E1101 14:18:46.123742 3279607 configmap.go:197] Couldn't get configMap kube-system/kube-proxy: failed to sync configmap cache: timed out waiting for the condition
Nov 01 14:18:46 IN-DT-DCKR-LDR1 kubelet[3279607]: E1101 14:18:46.123782 3279607 configmap.go:197] Couldn't get configMap calico-system/tigera-ca-bundle: failed to sync configmap cache: timed out waiting for the condition
Nov 01 14:18:46 IN-DT-DCKR-LDR1 kubelet[3279607]: E1101 14:18:46.123870 3279607 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/61a2bf9a-a4cb-49d8-8b63-de7a82841a55-kube-proxy podName:61a2bf9a-a4cb-49d8-8b63-de7a82841a55 nodeName:}" failed. No retries permitted until 2023-11-01 14:18:46.62383261 +0530 IST m=+3.132668643 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "kube-proxy" (UniqueName: "kubernetes.io/configmap/61a2bf9a-a4cb-49d8-8b63-de7a82841a55-kube-proxy") pod "kube-proxy-6ghjb" (UID: "61a2bf9a-a4cb-49d8-8b63-de7a82841a55") : failed to sync configmap cache: timed out waiting for the condition
Nov 01 14:18:46 IN-DT-DCKR-LDR1 kubelet[3279607]: E1101 14:18:46.123907 3279607 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/43736256-5241-46c4-b403-85db6d1d4d52-tigera-ca-bundle podName:43736256-5241-46c4-b403-85db6d1d4d52 nodeName:}" failed. No retries permitted until 2023-11-01 14:18:46.623887546 +0530 IST m=+3.132723570 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "tigera-ca-bundle" (UniqueName: "kubernetes.io/configmap/43736256-5241-46c4-b403-85db6d1d4d52-tigera-ca-bundle") pod "calico-node-rb5s8" (UID: "43736256-5241-46c4-b403-85db6d1d4d52") : failed to sync configmap cache: timed out waiting for the condition
Nov 01 14:18:46 IN-DT-DCKR-LDR1 kubelet[3279607]: E1101 14:18:46.124885 3279607 configmap.go:197] Couldn't get configMap kube-system/coredns: failed to sync configmap cache: timed out waiting for the conditionNov 01 14:18:46 IN-DT-DCKR-LDR1 kubelet[3279607]: E1101 14:18:46.124929 3279607 secret.go:192] Couldn't get secret calico-system/node-certs: failed to sync secret cache: timed out waiting for the condition
Nov 01 14:18:46 IN-DT-DCKR-LDR1 kubelet[3279607]: E1101 14:18:46.125014 3279607 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/08f1a4fd-92c5-48fc-a299-636a4f0a469a-config-volume podName:08f1a4fd-92c5-48fc-a299-636a4f0a469a nodeName:}" failed. No retries permitted until 2023-11-01 14:18:46.624972044 +0530 IST m=+3.133808093 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "config-volume" (UniqueName: "kubernetes.io/configmap/08f1a4fd-92c5-48fc-a299-636a4f0a469a-config-volume") pod "coredns-5fcc5bdd47-8llkc" (UID: "08f1a4fd-92c5-48fc-a299-636a4f0a469a") : failed to sync configmap cache: timed out waiting for the condition
Nov 01 14:18:46 IN-DT-DCKR-LDR1 kubelet[3279607]: E1101 14:18:46.125058 3279607 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/secret/43736256-5241-46c4-b403-85db6d1d4d52-node-certs podName:43736256-5241-46c4-b403-85db6d1d4d52 nodeName:}" failed. No retries permitted until 2023-11-01 14:18:46.625034355 +0530 IST m=+3.133870397 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "node-certs" (UniqueName: "kubernetes.io/secret/43736256-5241-46c4-b403-85db6d1d4d52-node-certs") pod "calico-node-rb5s8" (UID: "43736256-5241-46c4-b403-85db6d1d4d52") : failed to sync secret cache: timed out waiting for the condition
• sudo journalctl -u containerd -f
-- Logs begin at Fri 2022-04-22 10:32:53 IST. --
Oct 13 00:13:20 IN-DT-DCKR-LDR1 containerd[828]: time="2023-10-13T00:13:20.769518216+05:30" level=info msg="RemovePodSandbox for \"ea81a51660843963fda52520e4d5834bb065e8bc6b48d7350fea5ed462066936\""
Oct 13 00:13:20 IN-DT-DCKR-LDR1 containerd[828]: time="2023-10-13T00:13:20.769565086+05:30" level=info msg="Forcibly stopping sandbox \"ea81a51660843963fda52520e4d5834bb065e8bc6b48d7350fea5ed462066936\""
Oct 13 00:13:20 IN-DT-DCKR-LDR1 containerd[828]: time="2023-10-13T00:13:20.769696525+05:30" level=info msg="TearDown network for sandbox \"ea81a51660843963fda52520e4d5834bb065e8bc6b48d7350fea5ed462066936\" successfully"
Oct 13 00:13:20 IN-DT-DCKR-LDR1 containerd[828]: time="2023-10-13T00:13:20.852087515+05:30" level=info msg="RemovePodSandbox \"ea81a51660843963fda52520e4d5834bb065e8bc6b48d7350fea5ed462066936\" returns successfully"
Nov 01 12:16:31 IN-DT-DCKR-LDR1 containerd[828]: time="2023-11-01T12:16:31.730407989+05:30" level=info msg="No cni config template is specified, wait for other system components to drop the config."
Nov 01 14:18:43 IN-DT-DCKR-LDR1 containerd[828]: time="2023-11-01T14:18:43.710560884+05:30" level=info msg="No cni config template is specified, wait for other system components to drop the config."
Nov 01 14:43:42 IN-DT-DCKR-LDR1 containerd[828]: E1101 14:43:42.533234 828 exec.go:87] error executing command in container: failed to exec in container: failed to start exec "845f8446d19b7636bb87b001b881ce7b1f07dc372e61b542dd316605313e5bff": OCI runtime exec failed: exec failed: unable to start container process: exec: "bash": executable file not found in $PATH: unknown
Nov 01 14:44:30 IN-DT-DCKR-LDR1 containerd[828]: time="2023-11-01T14:44:30.136995113+05:30" level=info msg="Container exec \"b7fb9daf0322fbdbe4facb31c3d18fdd44d4c6402dc7733dbf5b2c4d659dcb72\" stdin closed"
Nov 01 14:44:30 IN-DT-DCKR-LDR1 containerd[828]: E1101 14:44:30.288136 828 exec.go:87] error executing command in container: failed to exec in container: failed to start exec "89067c1007468b006bc9b8d86b446ebbd04438025dd42cd90828123dc1a688b7": OCI runtime exec failed: exec failed: unable to start container process: exec: "powershell": executable file not found in $PATH: unknown
Nov 01 14:44:30 IN-DT-DCKR-LDR1 containerd[828]: E1101 14:44:30.419019 828 exec.go:87] error executing command in container: failed to exec in container: failed to start exec "370d225ba5a9bc40bdc43040b1eda910fe278bd0bf5174682a94bc63a3be18ce": OCI runtime exec failed: exec failed: unable to start container process: exec: "cmd": executable file not found in $PATH: unknown