6

Problem statement: Make a deployment with minimum disruption to clients connected via websockets.

Stack: GKE (Regional cluster - 1.22.8-gke.200), helm, Node.js/websockets, HPA, Rolling Update.

Our Kubernetes Cluster is composed by micro-services separated to processing and connectivity. Our connectivity micro-service, let's call it ws-gateway is responsible for maintaining websocket connections with our clients. We have the requirement to keep the connections alive on our end as long as possible. We can afford to close a connection ourselves when we release a new version of the ws-gateway. The clients can take up to 5' to establish a new connection and we do not have any influence on that back-off connectivity behavior.

The motivation of this post is to optimize our release strategy so clients can disconnect only once.

Context

Currently we are deploying with a rolling update strategy.

Important: we cannot afford all connections to be dropped at once, the load of re-establishing connections is expensive and out of scope of this post. During the rolling update the clients are still able to connect to old versions of the pods. According to rolling update documentation I didn't find anything that says otherwise.

if a Deployment is exposed publicly, the Service will load-balance the traffic only to available Pods during the update.

My take from this is that both versions receive traffic as long as they are available. Pods with old version will be terminated by k8s and nothing stops clients connecting to an old pod again.

Clients are connecting to our pods via a service (protocol: TCP, type: LoadBalancer).

Desired Outcome

During the release, clients connected to the old version are still connected until the pod is terminated, but new or disconnected clients can only connect to the new version of the pod.

Bonus: Passing-off a websocket would be quite interesting for us (Node.js).

Options

1. Using selectors in the service.

Label the version on pods and match the selector of the service to the new version. This sounds exactly what I need in the pipeline. The service is now routing traffic only on v2.

Unknowns

  1. What happens when we update the service?

When you create a Service of type LoadBalancer, a Google Cloud controller wakes up and configures a network load balancer. Wait a minute for the controller to configure the network load balancer and generate a stable IP address.

https://cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps#creating_a_service_of_type_loadbalancer

The LB is pass-through.

Network load balancers are not proxies. Load-balanced packets are received by backend VMs with the packet's source and destination IP addresses, protocol, and, if the protocol is port-based, the source and destination ports unchanged. Load-balanced connections are terminated by the backend VMs. Responses from the backend VMs go directly to the clients, not back through the load balancer. The industry term for this is direct server return.

https://cloud.google.com/load-balancing/docs/network

My understanding here is that the LB, in our case, is only used once, when the connection has been upgraded to websockets, then there is a direct connection with the pod and the client. So, changes to LB should not affect the established connections. But. I ran a test and updated the selector to not match v1 pods. I lost 70% of the connections. 30% was left intact, which was confusing. Why were those connections lost and why remained intact?

Setup test env: 3 Nodes running on different zones and only one pod on the cluster.

2. Ingress and weighted canary

Currently, we don't have an ingress for those pods, it's all done in the Service. Nginx ingress offers nginx.ingress.kubernetes.io/canary-weight which might be helpful.

Unknowns

  1. How can the switch be gradual? Can we switch the traffic in a matter of minutes?

3. Blue/Green Deployment

Similar to 2, it seems that all traffic is switched immediately.

4. Anthos (GCP Service Mesh)

From a quick read, it seems a bit overkill as those deployments can be achieved with nginx ingress.

5. Fail readinessProbe

Intentionally make the readinessProbe fail on old pods so they are removed from the service. Seems a bit hacky and not what the readinessProbe is intended for.

How does the HPA respond to failing readinessProbes?

We should also take into account the case of a rollback.


There is some reading I would like to do regarding option 2,3,4 but maybe you can enlighten me.

Other options?

thanosam
  • 71
  • 3

1 Answers1

0

You’re in a nuanced situation that many teams working with persistent WebSocket connections face. Let’s break this down into viable options, why your test behaved unexpectedly, and some recommendations based on trade-offs.

Clarifying the Challenge

Requirement: • Don’t drop all clients at once. • Allow connected clients to stay on the old pods. • Prevent new connections from hitting old pods. • Ensure graceful draining, even under HPA. • Keep a rollback path available.

You’re right: GCP’s Network Load Balancer is pass-through, meaning once the TCP connection (upgraded to WS) is established, it’s persistent and bypasses the load balancer. So changes to the Kubernetes Service shouldn’t affect existing connections if the pods aren’t terminated.

Root Cause of Unexpected Disconnects in Your Test

You updated the Service selector to point only to v2, but some connections still dropped. Why? 1. Kube-proxy iptables rules are updated fairly quickly, but there’s still a race. 2. GKE’s LB health checks (node-level) might prematurely terminate traffic to pods during that selector update. 3. Node eviction, HPA, or readiness/liveness probes might have marked the pod as unavailable. 4. Connection timeouts or rebalancing behaviors in GCP’s infrastructure could unintentionally drop some of those long-lived TCP sessions.

With only one pod, there’s also no real HA, so any churn or update to kube-proxy or iptables might have disconnected clients.

Deployment Strategies You Can Use

  1. Blue-Green + Versioned Services (Best Mix)

This solves most of your problems. • Run two Deployments: ws-gateway-v1 and ws-gateway-v2 • Create two Services: • ws-gateway-v1-svc → v1 pods • ws-gateway-v2-svc → v2 pods • The public LoadBalancer points to a proxy layer or intermediate service (optional) or is switched via updating an alias Service selector (ws-gateway-svc) • Update the alias service selector to point to only v2 when you’re ready.

Why it works: • Old pods stay alive with connections. • New clients connect only to new pods. • You can gracefully scale down v1 after ~5 minutes when most reconnects are done. • Supports rollback.

Tradeoff: You’ll need to handle cleanup and lifecycle policies manually. Add a TTL for old pods.

  1. Use Readiness Probes to Drain Pods Gracefully (with Precautions)

You can temporarily fail the readiness probe to pull old pods out of the service without killing them immediately.

But: • Make sure terminationGracePeriodSeconds is long enough. • Don’t let HPA scale down too aggressively — it may fight the probe failure. • If using HPA, add a custom metric based on connections or CPU and buffer the scaling logic.

Risk: Some infra (especially ingress controllers or NLB health checks) can misinterpret the pod as failed.

  1. Ingress with Canary or Weighted Routing

Introduce an Nginx Ingress (or GCP Ingress) with: • nginx.ingress.kubernetes.io/canary: "true" • nginx.ingress.kubernetes.io/canary-weight: "0" → then slowly increase

Pros: • Fine control of traffic shifting • Gradual rollout • Clients connect based on weighted logic

Cons: • Websockets are sensitive; you need to tune Nginx with proxy-read-timeout, connection upgrade, etc. • Setup overhead if you’re not using ingress already.

  1. Stick With Rolling Updates, Add Graceful Shutdown Logic in Pods

In your Node.js WebSocket server: • Catch SIGTERM • Stop accepting new connections • Wait for active connections to close or force close after a timeout (5 min max) • Delay process exit until then

process.on('SIGTERM', async () => { console.log("SIGTERM received. Cleaning up..."); await gracefulShutdownWebsockets(); process.exit(0); });

Set:

terminationGracePeriodSeconds: 300

This ensures pods don’t exit abruptly and helps maintain connection stability.

  1. Anthos / Istio / GCP Service Mesh

These are heavyweight but powerful. • You get fine-grained traffic shifting (e.g. 20%, 50%, 100%) • Good observability and policy enforcement • Works well for gradual rollouts and rollback

Only go this route if you’re already leaning toward mesh adoption.

Suggested Plan

Here’s a balanced rollout strategy: 1. Use two Deployments (blue/green style): ws-gateway-v1, ws-gateway-v2 2. Maintain both for at least 5–10 minutes after switching traffic 3. Use a single alias service that you update in CI/CD:

kubectl patch svc ws-gateway-svc -p '{"spec": {"selector": {"version": "v2"}}}'

4.  In each pod:
•   Handle SIGTERM
•   Gracefully close connections
•   Add terminationGracePeriodSeconds: 300
5.  (Optional) Implement a health check endpoint that reflects connection status (helps during readiness probing).

You can use a Helm-friendly setup for a Blue-Green Deployment Strategy with minimal WebSocket disruption, using versioned Deployments and Services.

Helm Values File (values.yaml)

app: name: ws-gateway version: v2 # v1, v2, etc.

image: repository: your-repo/ws-gateway tag: v2

replicaCount: 3

service: type: ClusterIP port: 80 targetPort: 8080

resources: limits: cpu: 500m memory: 512Mi requests: cpu: 250m memory: 256Mi

Deployment Template (templates/deployment.yaml)

apiVersion: apps/v1 kind: Deployment metadata: name: {{ .Values.app.name }}-{{ .Values.app.version }} labels: app: {{ .Values.app.name }} version: {{ .Values.app.version }} spec: replicas: {{ .Values.replicaCount }} selector: matchLabels: app: {{ .Values.app.name }} version: {{ .Values.app.version }} template: metadata: labels: app: {{ .Values.app.name }} version: {{ .Values.app.version }} spec: containers: - name: {{ .Values.app.name }} image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" ports: - containerPort: {{ .Values.service.targetPort }} lifecycle: preStop: exec: command: ["sh", "-c", "sleep 5"] readinessProbe: tcpSocket: port: {{ .Values.service.targetPort }} initialDelaySeconds: 5 periodSeconds: 10 resources: {{- toYaml .Values.resources | nindent 12 }} terminationGracePeriodSeconds: 300

Versioned Service Template (templates/service-versioned.yaml)

apiVersion: v1 kind: Service metadata: name: {{ .Values.app.name }}-{{ .Values.app.version }} labels: app: {{ .Values.app.name }} spec: type: {{ .Values.service.type }} ports: - port: {{ .Values.service.port }} targetPort: {{ .Values.service.targetPort }} protocol: TCP selector: app: {{ .Values.app.name }} version: {{ .Values.app.version }}

Stable Alias Service (templates/service-stable.yaml)

This is the only service exposed to the clients via LoadBalancer or external IP.

apiVersion: v1 kind: Service metadata: name: {{ .Values.app.name }} labels: app: {{ .Values.app.name }} spec: type: {{ .Values.service.type }} ports: - port: {{ .Values.service.port }} targetPort: {{ .Values.service.targetPort }} protocol: TCP selector: app: {{ .Values.app.name }} version: {{ .Values.app.version }}

Upgrade Flow (CI/CD or Manual Steps) 1. Deploy new version with Helm:

helm upgrade --install ws-gateway ./charts/ws-gateway -f values.yaml

2.  Alias Service will point only to v2 automatically.
3.  Old pods (v1) remain running, holding open WebSocket connections.
4.  After ~5 minutes, safely delete the old release:

helm uninstall ws-gateway-v1

5.  For rollback, just re-run Helm with previous values.

Optional Enhancements you can implement as well as per your requirements.enter code here • Add annotations for metrics or health dashboards. • Use Helm hooks to automate cleanup of old versions after a TTL. • Integrate with ArgoCD or Flux for GitOps-controlled rollout.