0

Summary

Quite suddenly, last week (after months without any failure), our ArgoCD deployment in our AKS cluster broke.

Specifically: all ArgoCD applications got into the "Unknown" state, unable to compute any diffs against the cluster (error messages below).

We haven't seen this issue before.

The cluster is seemingly healthy: usual kubectl commands (get, describe, apply, etc) run from our deploy server still work fine. Ditto for helm.

Usual remedies did not work:

  • refresh the root app
  • hard refresh the root app
  • restart the argocd deployments (server, repo server, cache)
  • re-install argocd from scratch

Information

  • ArgoCD version: v2.14.4+3d901f2
  • Kubernetes cluster version: 1.30 (AKS)

cluster connection state

Error messages

Failed to load live state: failed to get cluster info for "https://kubernetes.default.svc": error synchronizing cache state : failed to load open api schema while syncing cluster cache: error getting openapi resources: the server is currently unable to handle the request
error synchronizing cache state : failed to load open api schema while syncing cluster cache: error getting openapi resources: the server is currently unable to handle the request

Related tickets

application condition errors

Question

Any ideas on how to debug / troubleshoot this issue?

thiagowfx
  • 145
  • 1
  • 6

1 Answers1

0

Resolved: In our case, the issue was caused by Kyverno: https://github.com/kubernetes/kubernetes/issues/122668.

Two distinct helm charts were installing duplicate CRDs. Somehow this has crashed the OpenAPI server / endpoint.

Deleting the duplicate CRDs resolved the issue.

thiagowfx
  • 145
  • 1
  • 6