Problem Statement:
I have a GCP Global External Load Balancer with Kubernetes pods behind it, connected via an HTTPRoute and a Service. When a pod is deleted, it correctly follows the termination process:
- A preStop hook is configured.
- terminationGracePeriodSeconds is set.
- The application correctly handles SIGTERM and continues serving requests during termination for some time
However, the GCP Load Balancer starts returning 503 errors a few seconds after the pod gets SIGTERM, even though the pod is still alive and serving traffic (verified via kubectl port-forward on the pod).
Error message from the load balancer logs:
failed with status 503 and body upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused
Observed Timeline:
- Pod receives SIGTERM → The pod is still alive and continues serving requests.
- Pod is removed from the Kubernetes Service → The Kubernetes Service itself does not have an outage.
- A few seconds after SIGTERM, the GCP Load Balancer starts returning exactly 10 errors (503) before requests stabilize again.
- Several seconds after the burst of 503er Pod finally terminates after terminationGracePeriodSeconds expires.
What I Have Tried:
- Enabled Connection Draining in the GCP Backend Service with a timeout of 30 seconds.
- Ensured Readiness Probe is Healthy Before Termination:
Code:
readinessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 5
failureThreshold: 3
- Used a preStop Hook to Delay Termination: Code:
lifecycle:
preStop:
exec:
command:
- sh
- -c
- "sleep 10"
- Checked NEG (Network Endpoint Group) Behavior → The pod is removed from the NEG immediately after entering Terminating.
- Set a terminationGracePeriodSeconds of 30 seconds.
Question: How to make sure during a pod termination the LB does NOT fires 503 errors, even if the pod is still healthy and could serve traffic.
Environment Details:
- Kubernetes: GKE
- Ingress: HTTPRoute + GCP Load Balancer
- Service Type: NEG-backed
- GCP Backend Service: Connection draining enabled