0

Problem Statement:

I have a GCP Global External Load Balancer with Kubernetes pods behind it, connected via an HTTPRoute and a Service. When a pod is deleted, it correctly follows the termination process:

  • A preStop hook is configured.
  • terminationGracePeriodSeconds is set.
  • The application correctly handles SIGTERM and continues serving requests during termination for some time

However, the GCP Load Balancer starts returning 503 errors a few seconds after the pod gets SIGTERM, even though the pod is still alive and serving traffic (verified via kubectl port-forward on the pod).

Error message from the load balancer logs:

failed with status 503 and body upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused

Observed Timeline:

  1. Pod receives SIGTERM → The pod is still alive and continues serving requests.
  2. Pod is removed from the Kubernetes Service → The Kubernetes Service itself does not have an outage.
  3. A few seconds after SIGTERM, the GCP Load Balancer starts returning exactly 10 errors (503) before requests stabilize again.
  4. Several seconds after the burst of 503er Pod finally terminates after terminationGracePeriodSeconds expires.

What I Have Tried:

  1. Enabled Connection Draining in the GCP Backend Service with a timeout of 30 seconds.
  2. Ensured Readiness Probe is Healthy Before Termination:

Code:

readinessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 5
  failureThreshold: 3
  1. Used a preStop Hook to Delay Termination: Code:
lifecycle:
  preStop:
    exec:
      command:
        - sh
        - -c
        - "sleep 10"
  1. Checked NEG (Network Endpoint Group) Behavior → The pod is removed from the NEG immediately after entering Terminating.
  2. Set a terminationGracePeriodSeconds of 30 seconds.

Question: How to make sure during a pod termination the LB does NOT fires 503 errors, even if the pod is still healthy and could serve traffic.

Environment Details:

  • Kubernetes: GKE
  • Ingress: HTTPRoute + GCP Load Balancer
  • Service Type: NEG-backed
  • GCP Backend Service: Connection draining enabled
CoffeJunky
  • 111
  • 2

0 Answers0