4

I'm deploying an application using ArgoCD. The deployment manifests include a Job that performs some one-time initialization for the application. The Job resource looks like this:

apiVersion: batch/v1
kind: Job
metadata:
  labels:
    app.kubernetes.io/instance: house
    app.kubernetes.io/name: step-certificates
  name: create-acme-provisioner
  namespace: step-certificates
spec:
  backoffLimit: 100
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: house
        app.kubernetes.io/name: step-certificates
    spec:
      containers:
      - command:
        - /bin/bash
        - -c
        - |
          while ! step ca health; do
            echo "waiting for ca"
            sleep 1
          done
      if ! step ca provisioner list | grep -q '"name": "acme"'; then
        step ca provisioner add acme --type ACME \
          --admin-subject step \
          --password-file /home/step/secrets/passwords/password \
          --admin-provisioner "Admin JWK"
      fi
    image: cr.step.sm/smallstep/step-ca:0.22.1
    name: create-acme-provisioner
    volumeMounts:
    - mountPath: /home/step/certs
      name: certs
      readOnly: true
    - mountPath: /home/step/config
      name: config
      readOnly: true
    - mountPath: /home/step/secrets
      name: secrets
      readOnly: true
    - mountPath: /home/step/secrets/passwords
      name: ca-password
      readOnly: true
  restartPolicy: Never
  securityContext:
    fsGroup: 1000
    runAsGroup: 1000
    runAsNonRoot: true
    runAsUser: 1000
  volumes:
  - configMap:
      name: step-certificates-certs
    name: certs
  - configMap:
      name: step-certificates-config
    name: config
  - name: secrets
    secret:
      secretName: step-certificates-secrets
  - name: ca-password
    secret:
      secretName: step-certificates-ca-password

ttlSecondsAfterFinished: 60

It works as intended -- it will fail a couple of times while the main application is starting up, but then it runs, and everything looks great:

$ kubectl get pods
NAME                            READY   STATUS      RESTARTS   AGE
create-acme-provisioner-7zhp2   0/1     Completed   0          12s
step-certificates-0             2/2     Running     0          54m
$ kubectl get jobs
NAME                      COMPLETIONS   DURATION   AGE
create-acme-provisioner   1/1           3s         20s

The problem is that ArgoCD keeps re-syncing the Job resource.every minute, so the job runs again...and again...and so forth. The logs from the argocd-application-controller pod look like this:

time="2022-09-30T16:20:42Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:20:42Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00259-Dpgma tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:20:42Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:20:42Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:20:42Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00259-Dpgma
time="2022-09-30T16:21:45Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:21:45Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00260-KsLXq tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:21:45Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:21:45Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:21:45Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00260-KsLXq
time="2022-09-30T16:22:49Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:22:49Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00261-itFqU tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:22:49Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:22:49Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:22:49Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00261-itFqU

Why is ArgoCD re-syncing this resource, and how do I get it to stop?

larsks
  • 47,453

1 Answers1

5

I figured out what was going on.

The Job was configured with ttlSecondsAfterFinished, which is documented here. I had misread the documentation and thought this would clean up the Pods created by the job, but in fact it causes the Job itself to be removed.

Because the Job was managed by ArgoCD, when it was deleted due to the ttlSecondsAfterFinished setting ArgoCD would prompt re-create it.

As @SYN suggested in a comment, an alternative solution is to configure the Job as an ArgoCD PostSync hook with a hook-delete-policy:

apiVersion: batch/v1
kind: Job
metadata:
  name: create-acme-provisioner
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:

When ArgoCD successfully syncs the application, it will create this job, and when the job is successful, ArgoCD will delete it.

This means the job runs once on every sync, but that's fine. It's no longer running every 60 seconds.

larsks
  • 47,453