How to debug why a NodePort is refusing connections in K8S?

Question

Introduction

I recently got a simple web app working on a three-node Ubuntu Server with MicroK8S. I decided to try rebuilding my cluster and reinstalling everything using YAML manifests, to ensure the process was replicable. However, the app is now not reachable from outside of the cluster. I am seeking debugging techniques to drill into why the NodePort is apparently not creating a TCP listener on all nodes.

Here are my nodes:

name	IP	colour	role
arran	192.168.50.251	yellow	leader
nikka	192.168.50.74	blue	worker
yamazaki	192.168.50.135	green	worker

The cluster has again elected to run the workload on the third node, Yamazaki. I expect any web traffic hitting Arran or Nikka to be internally re-routed to Yamazaki to be serviced, as was happening previously.

What I did

From the previously working cluster/app, here is what I did to reset everything:

Do microk8s leave on all follower nodes
Do microk8s kubectl delete node <nodename> on the leader for each follower node (they were not removed automatically when they left)
Do microk8s reset on all nodes
Enable addons (dns, ingress). I don't know if either are necessary
Create join command on leader, microk8s add-node for each follower
Run a fresh join command microk8s join <ip>/<token> on each follower
Run microk8s status on any node to ensure cluster is in HA mode
Sideload an app image tarball from the leader, using microk8s images import workload.tar

Launch the app via microk8s kubectl apply -f k8s-manifests/production/pod.yaml -f k8s-manifests/production/nodeport.yaml

Here is the Pod:

 apiVersion: v1
 kind: Pod
 metadata:
   name: k8s-workload
   annotations:
     kubectl.kubernetes.io/last-applied-configuration: |
       {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"k8s-workload","namespace":"default"},"spec":{"containers":[{"image":"k8s-workload","imagePullPolicy":"Never","name":"k8s-workload","ports":[{"containerPort":9090,"protocol":"TCP"}]}]}}
 spec:
   containers:
   - image: k8s-workload
     imagePullPolicy: Never
     name: k8s-workload
     ports:
     - containerPort: 9090
       protocol: TCP

Here is the NodePort:

 apiVersion: v1
 kind: Service
 metadata:
   name: np-service
 spec:
   type: NodePort
   ports:
     - port: 9090
       targetPort: 9090
       nodePort: 30090
   selector:
     run: k8s-workload
   # This should not be needed, but it didn't help
   # this time anyway
   externalIPs: [192.168.50.251]

Check the app is running via an internal container call, microk8s kubectl exec -ti k8s-workload -- curl http://localhost:9090 - this is fine
Check the app is running via a port forwarder on any node, microk8s kubectl port-forward pod/k8s-workload 9090 --address='0.0.0.0' - this is fine
Nodes not listening externally (curl http://localhost:30090 gets a refused connection, same with any node IP address from a non-cluster machine on the LAN)

System state

Here is what is running from microk8s kubectl get all -o wide:

NAME               READY   STATUS    RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES
pod/k8s-workload   1/1     Running   0          20h   10.1.134.193   yamazaki   <none>           <none>
NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP      PORT(S)          AGE     SELECTOR
service/kubernetes   ClusterIP   10.152.183.1     <none>           443/TCP          35d     <none>
service/np-service   NodePort    10.152.183.175   192.168.50.251   9090:30090/TCP   3d21h   run=k8s-workload

I don't know what service/kubernetes is, I assume it is just part of the standard K8S infra.

Observations

I think this article is saying that my web app needs to be a service, but I only have a pod. I think that when this was working previously, I only had a pod, but the cluster had gotten into a bit of a mess, so it is possible that a service version of the app was running at the same time as the pod version.

The article also suggests that I ought to be using an ingress system. However, given that a NodePort is my present learning focus, I don't want to give up with it just yet. Ingress can come later.

I think I can be sure that there are no firewall issues, since any connections to port 30090 are rejected even in a console session on a node in the cluster.

I would like to run something like microk8s kubectl logs service np-service, to see what the NodePort is doing, but the logs subcommand only works on pods.

What can I try next?

Eleasar · Accepted Answer · 2023-06-20T09:23:47.597

When using kubectl run to start Pods, Kubernetes automatically labels them with the name used whilst deploying.

For example, take a look at the YAML generated by kubectl run nginx --image=nginx -o yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx <- Automatically assigned
  name: nginx
  namespace: default
spec:
  containers:
  - image: nginx
    ...

Now, assuming the YAML of the Pod k8s-workload you have provided is complete, this label is currently missing. This is important because of the selector you used in the NodePort's specs.

apiVersion: v1
kind: Service
metadata:
  name: np-service
spec:
  ...
  selector:
    run: k8s-workload <- This tells Kubernetes who the Service is for

I'm guessing that at the moment, Kubernetes simply cannot find the Pod that the Service is for. You can test this theory by running kubectl get pods -l run=k8s-workload. You should get an error message that looks something like No resources found in default namespace.

Fixing this is as easy as (re-)assigning the Label. This can be done by using the kubectl label command like kubectl label pod k8s-workload run=k8s-workload.

A detailed guide on how to debug Services, as well as more information on how Labels and Selectors work can be found in the official documentation.

Update

In relation to whether this situation would be logged: A Service without Endpoints is not an error and (to my knowledge) won't be logged anywhere. Imagine a deployment that is only needed during a few hours a week. The Deployment not being active, and thus the Service not having any Endpoints for 90% of the week is expected and does not mean something isn't configured correctly or not working.

score 1 · Answer 2 · answered Jun 18 '23 at 14:00

As I suspected, the solution was simple. Eleasar has kindly supplied a label command to fix the problem, but I preferred to fix it in the YAML, as I would regard that as more repeatable. Here is my new pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: k8s-workload
  labels:
    run: k8s-workload
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"k8s-workload","namespace":"default"},"spec":{"containers":[{"image":"k8s-workload","imagePullPolicy":"Never","name":"k8s-workload","ports":[{"containerPort":9090,"protocol":"TCP"}]}]}}
spec:
  containers:
  - image: k8s-workload
    imagePullPolicy: Never
    name: k8s-workload
    ports:
    - containerPort: 9090
      protocol: TCP

There are just two new lines, to add a unique label to this object.