0

I've set up a k8s cluster to host several services for a project. The whole thing runs on a "one-box-wonder" which is a Rocky 9 Linux server acting as a KVM hypervisor (image below).

Each K8s VM has 2 network interfaces: eth0 - private NAT'ed vnet from the hypervisor configured to use 192.168.123.0/24 and eth1 - plumbed to the hypervisors bridge interface (no IP configured as I only have a /28 for this project) The server is in a colo facility so I don't have access to the network equipment.

I have configured 2 IPAddressPools public-pool & private-pool and IP assignment is working as expected. I'm using L2 advertising.

I believe I can see the announcer responding as expected when I arping the LoadBalancer external IP:

[root@prod01 \~]# arping <ipaddr>  
ARPING <ipaddr> from <ipaddr> bridge0  
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9]  0.723ms
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9]  0.694ms
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9]  0.685ms
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9]  0.675ms
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9]  0.688ms
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9]  0.664ms
Sent 6 probes (1 broadcast(s))  
Received 6 response(s)

Where [52:54:00:CE:9B:A9] is the MAC of the worker that is advertising.

Similarly I can see the request come in when I run tcpdump on the announcing node:

13:14:51.001486 ARP, Request who-has <lb_IP> (Broadcast) tell <hypervisor IP>, length 28
13:14:52.001485 ARP, Request who-has <lb_IP> (52:54:00:ce:9b:a9) tell <hypervisor IP>, length 28
13:14:53.001486 ARP, Request who-has <lb_IP> (52:54:00:ce:9b:a9) tell <hypervisor IP>, length 28
13:14:54.001495 ARP, Request who-has <lb_IP> (52:54:00:ce:9b:a9) tell <hypervisor IP>, length 28
13:14:55.001492 ARP, Request who-has <lb_IP> (52:54:00:ce:9b:a9) tell <hypervisor IP>, length 28

However when I try to access my test deployment the connection just times out and when looking at a tcpdump - it would appear that no response is coming through:

[root@kube-wrkr-2 \~]# tcpdump -n -i eth1 src host <ipaddr>
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
13:55:23.140415 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395428550 ecr 0,nop,wscale 7], length 0
13:55:24.202968 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395429613 ecr 0,nop,wscale 7], length 0
13:55:26.251991 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395431662 ecr 0,nop,wscale 7], length 0
13:55:30.282973 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395435693 ecr 0,nop,wscale 7], length 0
13:55:38.666971 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395444077 ecr 0,nop,wscale 7], length 0
13:55:55.050972 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395460461 ecr 0,nop,wscale 7], length 0

irisa is an auto-identified service for port 11000

I'm unsure if this is a routing issue. I added the gateway for my /28 to the bridged network interfaces but that had no impact (and I don't see why that would prevent the announcing node from responding to traffic requests)

I don't see anything out of sorts with the metallb deployment and when I deploy with the private (192.168.123.0/24) network things work as expected from within the virtual network.

I'm open to any and all suggestions around what I may have done to bork this up. I'm wondering if I need to go the BGP announce route rather than L2.

Thanks in advance from the community for looking at this long post and any suggestions / troubleshooting recommendations.

Deployment diagram

metallb configuration:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: public-pool
  namespace: metallb-system
spec:
  addresses:
  - <starting IP> - <ending IP>
  autoAssign: false
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: private-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.123.101 - 192.168.123.224
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: svcl2
  namespace: metallb-system
spec:
  ipAddressPools:
  - public-pool
  interfaces:
  - eth1

public-pool test:

apiVersion: apps/v1
kind: Deployment
metadata:
 name: webserver-test
spec:
 replicas: 2
 selector:
    matchLabels:
      app: simple-http-server
 template:
    metadata:
      labels:
        app: simple-http-server
    spec:
      containers:
      - name: http-server
        imagePullPolicy: Always
        image: httpd:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: loadbalancer-service
  annotations:
    metallb.universe.tf/address-pool: public-pool
spec:
  selector:
    app: simple-http-server
  ports:
    - port: 11000
      targetPort: 80
  type: LoadBalancer

Expected behavior would be that I could get a page served up when making an http request to the public IP on port 11000, but instead I get no response served up from the announcing host and the request times out.

This same test on the private-pool works as expected.

Arthur
  • 1
  • 1

0 Answers0