1

Introduction

I recently got a simple web app working on a three-node Ubuntu Server with MicroK8S. I decided to try rebuilding my cluster and reinstalling everything using YAML manifests, to ensure the process was replicable. However, the app is now not reachable from outside of the cluster. I am seeking debugging techniques to drill into why the NodePort is apparently not creating a TCP listener on all nodes.

Here are my nodes:

name IP colour role
arran 192.168.50.251 yellow leader
nikka 192.168.50.74 blue worker
yamazaki 192.168.50.135 green worker

The cluster has again elected to run the workload on the third node, Yamazaki. I expect any web traffic hitting Arran or Nikka to be internally re-routed to Yamazaki to be serviced, as was happening previously.

What I did

From the previously working cluster/app, here is what I did to reset everything:

  1. Do microk8s leave on all follower nodes

  2. Do microk8s kubectl delete node <nodename> on the leader for each follower node (they were not removed automatically when they left)

  3. Do microk8s reset on all nodes

  4. Enable addons (dns, ingress). I don't know if either are necessary

  5. Create join command on leader, microk8s add-node for each follower

  6. Run a fresh join command microk8s join <ip>/<token> on each follower

  7. Run microk8s status on any node to ensure cluster is in HA mode

  8. Sideload an app image tarball from the leader, using microk8s images import workload.tar

  9. Launch the app via microk8s kubectl apply -f k8s-manifests/production/pod.yaml -f k8s-manifests/production/nodeport.yaml

    Here is the Pod:

     apiVersion: v1
     kind: Pod
     metadata:
       name: k8s-workload
       annotations:
         kubectl.kubernetes.io/last-applied-configuration: |
           {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"k8s-workload","namespace":"default"},"spec":{"containers":[{"image":"k8s-workload","imagePullPolicy":"Never","name":"k8s-workload","ports":[{"containerPort":9090,"protocol":"TCP"}]}]}}
     spec:
       containers:
       - image: k8s-workload
         imagePullPolicy: Never
         name: k8s-workload
         ports:
         - containerPort: 9090
           protocol: TCP
    

    Here is the NodePort:

     apiVersion: v1
     kind: Service
     metadata:
       name: np-service
     spec:
       type: NodePort
       ports:
         - port: 9090
           targetPort: 9090
           nodePort: 30090
       selector:
         run: k8s-workload
       # This should not be needed, but it didn't help
       # this time anyway
       externalIPs: [192.168.50.251]
    
  10. Check the app is running via an internal container call, microk8s kubectl exec -ti k8s-workload -- curl http://localhost:9090 - this is fine

  11. Check the app is running via a port forwarder on any node, microk8s kubectl port-forward pod/k8s-workload 9090 --address='0.0.0.0' - this is fine

  12. Nodes not listening externally (curl http://localhost:30090 gets a refused connection, same with any node IP address from a non-cluster machine on the LAN)

System state

Here is what is running from microk8s kubectl get all -o wide:

NAME               READY   STATUS    RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES
pod/k8s-workload   1/1     Running   0          20h   10.1.134.193   yamazaki   <none>           <none>

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 35d <none> service/np-service NodePort 10.152.183.175 192.168.50.251 9090:30090/TCP 3d21h run=k8s-workload

I don't know what service/kubernetes is, I assume it is just part of the standard K8S infra.

Observations

I think this article is saying that my web app needs to be a service, but I only have a pod. I think that when this was working previously, I only had a pod, but the cluster had gotten into a bit of a mess, so it is possible that a service version of the app was running at the same time as the pod version.

The article also suggests that I ought to be using an ingress system. However, given that a NodePort is my present learning focus, I don't want to give up with it just yet. Ingress can come later.

I think I can be sure that there are no firewall issues, since any connections to port 30090 are rejected even in a console session on a node in the cluster.

I would like to run something like microk8s kubectl logs service np-service, to see what the NodePort is doing, but the logs subcommand only works on pods.

What can I try next?

halfer
  • 259

2 Answers2

3

When using kubectl run to start Pods, Kubernetes automatically labels them with the name used whilst deploying.

For example, take a look at the YAML generated by kubectl run nginx --image=nginx -o yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx <- Automatically assigned
  name: nginx
  namespace: default
spec:
  containers:
  - image: nginx
    ...

Now, assuming the YAML of the Pod k8s-workload you have provided is complete, this label is currently missing. This is important because of the selector you used in the NodePort's specs.

apiVersion: v1
kind: Service
metadata:
  name: np-service
spec:
  ...
  selector:
    run: k8s-workload <- This tells Kubernetes who the Service is for

I'm guessing that at the moment, Kubernetes simply cannot find the Pod that the Service is for. You can test this theory by running kubectl get pods -l run=k8s-workload. You should get an error message that looks something like No resources found in default namespace.

Fixing this is as easy as (re-)assigning the Label. This can be done by using the kubectl label command like kubectl label pod k8s-workload run=k8s-workload.

A detailed guide on how to debug Services, as well as more information on how Labels and Selectors work can be found in the official documentation.

Update

In relation to whether this situation would be logged: A Service without Endpoints is not an error and (to my knowledge) won't be logged anywhere. Imagine a deployment that is only needed during a few hours a week. The Deployment not being active, and thus the Service not having any Endpoints for 90% of the week is expected and does not mean something isn't configured correctly or not working.

Eleasar
  • 388
1

As I suspected, the solution was simple. Eleasar has kindly supplied a label command to fix the problem, but I preferred to fix it in the YAML, as I would regard that as more repeatable. Here is my new pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: k8s-workload
  labels:
    run: k8s-workload
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"k8s-workload","namespace":"default"},"spec":{"containers":[{"image":"k8s-workload","imagePullPolicy":"Never","name":"k8s-workload","ports":[{"containerPort":9090,"protocol":"TCP"}]}]}}
spec:
  containers:
  - image: k8s-workload
    imagePullPolicy: Never
    name: k8s-workload
    ports:
    - containerPort: 9090
      protocol: TCP

There are just two new lines, to add a unique label to this object.

halfer
  • 259