1

I've got a small k3s cluster in my home hosting a few websites and local applications. For the most part, I've been able to wrangle it to host a variety of services, but the LetsEncrypt functionality has never worked very well for me. When it has worked, I've been unsure as to why, and afraid to tinker with it lest it break again.

At the moment, there are two sites on the cluster with TLS support working fine... for years even, but now that I'm adding a third site, I'm running into the same errors I was getting in the past and I have no idea why. I'm hoping someone here can explain what I've missed.

The error appears to be a complaint that when the magic URL is requested (how do I determine the URL being attempted?), they get HTML rather than the expected response, though it's not clear why it's doing that, or even what service is constructing the response.

I0820 17:28:23.398812       1 service.go:43] cert-manager/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="REDACTED.tld" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-9qp5z" "related_resource_namespace"="my-namespace" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="tls-REDACTED" "resource_namespace"="my-namespace" "resource_version"="v1" "type"="HTTP-01"
E0820 17:28:23.585197       1 sync.go:190] cert-manager/challenges "msg"="propagation check failed" "error"="did not get expected response when querying endpoint, expected \"REDACTED.REDACTED\" but got: <!DOCTYPE html PUBLIC \"-... (truncated)" "dnsName"="REDACTED.tld" "resource_kind"="Challenge" "resource_name"="tls-REDACTED" "resource_namespace"="my-namespace" "resource_version"="v1" "type"="HTTP-01"
I0820 17:28:33.585741       1 pod.go:59] cert-manager/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="REDACTED.tld" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-5gnhm" "related_resource_namespace"="my-namespace" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="tls-REDACTED" "resource_namespace"="my-namespace" "resource_version"="v1" "type"="HTTP-01"
I0820 17:28:33.585898       1 service.go:43] cert-manager/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="REDACTED.tld" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-9qp5z" "related_resource_namespace"="my-namespace" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="tls-REDACTED" "resource_namespace"="my-namespace" "resource_version"="v1" "type"="HTTP-01"
E0820 17:28:33.600402       1 sync.go:190] cert-manager/challenges "msg"="propagation check failed" "error"="did not get expected response when querying endpoint, expected \"REDACTED.REDACTED\" but got: <!DOCTYPE html PUBLIC \"-... (truncated)" "dnsName"="REDACTED.tld" "resource_kind"="Challenge" "resource_name"="tls-REDACTED" "resource_namespace"="my-namespace" "resource_version"="v1" "type"="HTTP-01"

The entire application consists of a Python web app, a Python worker, Redis, and Nginx, which adds up to a lot of YAML. I'm including the markup I assume is relevant for this case:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: my-namespace
spec:
  replicas: 1
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:alpine
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              memory: 32Mi
              cpu: 125m
            limits:
              memory: 32Mi
              cpu: 125m
          ports:
            - containerPort: 80
          volumeMounts:
            - name: public
              mountPath: "/usr/share/nginx/html"
      volumes:
        - name: public
          persistentVolumeClaim:
            claimName: public

apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: web namespace: my-namespace annotations: kubernetes.io/ingress.class: traefik cert-manager.io/cluster-issuer: letsencrypt acme.cert-manager.io/http01-edit-in-place: 'true' spec: rules: - host: REDACTED.tld http: paths: - path: /static pathType: Prefix backend: service: name: nginx port: number: 8000 - path: /media pathType: Prefix backend: service: name: nginx port: number: 8000 - path: / pathType: Prefix backend: service: name: web port: number: 8000 tls: - secretName: tls hosts: - REDACTED.tld


apiVersion: v1 kind: Service metadata: name: nginx namespace: my-namespace spec: selector: app: nginx ports: - port: 8000 targetPort: 80 name: tcp


apiVersion: v1 kind: Service metadata: name: web namespace: my-namespace spec: selector: app: web ports: - port: 8000 targetPort: 8000 name: gunicorn

I'm quite comfortable working with "bare metal" Linux administration, but Kubernetes still feels like too much "magic" to me. The above was built through following the K3s Rocks tutorials and as mentioned above, somehow works for two other domains — though they didn't at first and then somehow started working.

I'd appreciate any suggestions for this problem, as well as any recommendations for how I could/should do the above better.

1 Answers1

1

Thanks to @DavidW's comment, I was able to track the problem down. CertManager was indeed creating an ingress pointing to:

http://my-domain.tld/.well-known/acme-challenge/<some-random-key>

And when I requested that URL from a remote network, it responded with the appropriate value. So it was working, but this process of testing that response before sending the certificate request to LetsEncrypt was still failing.

As it happens, since this service runs in my home, I have a local domain server that allows local access to remote services. This way when I query my-domain.tld from my home office, I get 192.168.x.x rather than the public IP. As my router can't handle requests for its own external IP coming from inside the LAN, this is my workaround.

Anyway, this domain server didn't have a record for my-domain.tld, so while a request for my-domain.tld coming from outside my home resolved the test URL correctly, requests by CertManager were probably getting some sort of default page my router returns when trying to access my external IP from the internal network.

The fix in my case was to add a CNAME record for this new domain to my local DNS that points to the k8s main node.