0

Introduction

I've an issue with a pod in a StatefulSet which is terminated, stays in the Completed state and is not restarted.

I will describe the situation in a concrete example, that will provide some status and log data to analyze the issue.

Background: Our Installation

We run a mongodb 7.0.7 aks kubernetes cluster using a StatefulSet

Analysis

In the status section of the pod's yaml (see below) I can see that the pod was terminated with status code 0, but I can't see a reason, why the pod isn't restarted. Of course we have restartPolicy: Always

In the kubelet logs (also below) I can see that the pod was evicted (because of NodeHasInsufficientMemory).

My understanding of kubernetes is, that the pod

  • should be always rescheduled, so that the StatefulSet has the desired count of healthy pods.
  • remains in Pending state as long as not enough resources are available
  • the pod will be created as soon as enough resources are available

Question

  • Is my understanding correct?
  • So why isn't the pod rescheduled?
  • Which kubernetes process should care about the rescheduling?
  • Can I find more logs about rescheduling?

Status YAML

status:
  conditions:
  ...
  - lastProbeTime: null
    lastTransitionTime: "2024-04-30T23:48:10Z"
    reason: PodCompleted
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-04-30T23:48:10Z"
    reason: PodCompleted
    status: "False"
    type: ContainersReady
  ...
  containerStatuses:
  - containerID: containerd://12345
    image: mongodb:7.0.7
    imageID: mongodb@sha256:12345
    lastState: {}
    name: mongodb
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://123345
        exitCode: 0
        finishedAt: "2024-04-30T23:48:09Z"
        reason: Completed
        startedAt: "2024-04-24T17:55:38Z"

Kubelet Log

kubectl debug node/mynode -it --image=busybox

chroot /host journalctl -u kubelet -o cat|grep mongodb

... ... kuberuntime_container.go ... : "Killing container with a grace period" pod="mongodb-0" ... ... eviction_manager.go:427] "Eviction manager: pods successfully cleaned up" pods=[mongodb-0]

0 Answers0