Skip to main content

Sync Waves and Resource Hooks

You've learned how to manage sync policies in Lesson 8. Now you face a critical challenge: deployment order matters. Your PostgreSQL database migration must run before your FastAPI application deploys. Your health checks must pass before marking the deployment as healthy. Your rollback must run cleanup jobs when deployment fails.

This is where sync waves and resource hooks solve real problems. Waves order resource creation sequentially. Hooks define what happens at critical sync points: before, during, and after deployment.

The Order Problem

Consider this scenario. You're deploying version 2 of your agent with a database schema change:

1. Your CI pipeline builds the new container
2. ArgoCD syncs the manifests
3. ArgoCD creates: ConfigMap, Secret, Deployment, Service all at once
4. The Deployment starts immediately without migrations
5. The container crashes (no database columns it expects)

Without hooks, your application fails because migrations never ran. With hooks:

1. CI pipeline builds the new container
2. ArgoCD syncs the manifests
3. ArgoCD creates Secret, ConfigMap (wave 0)
4. ArgoCD runs migration Job as PreSync hook (before sync)
5. ArgoCD creates Deployment, Service (wave 1)
6. ArgoCD verifies readiness (hook with HookSucceeded deletion policy)

The difference is simple: waves group resources by creation order. Hooks define executable steps around those waves.

Sync Waves Explained

A sync wave is an annotation that tells ArgoCD: "Create this resource in this order, then move to the next wave."

Wave Numbers and Ordering

Waves execute in ascending numerical order:

Wave -1  → Created first (infrastructure preparation)
Wave 0 → Created second (main infrastructure: config, secrets, volumes)
Wave 1 → Created third (applications: deployments, services)
Wave 2 → Created fourth (post-deployment: monitoring, cleanup)

Key principle: ArgoCD waits for each wave to fully sync and stabilize before moving to the next wave. A resource in wave 1 cannot start until all wave 0 resources are healthy.

Annotating Resources for Waves

The annotation syntax is simple:

metadata:
annotations:
argocd.argoproj.io/sync-wave: "0"

Example: ConfigMap and Secret in Wave 0

# Config must exist before app starts
apiVersion: v1
kind: ConfigMap
metadata:
name: agent-config
annotations:
argocd.argoproj.io/sync-wave: "0"
data:
agent_model: gpt-4
max_tokens: "4096"
---
# Secrets in same wave as config
apiVersion: v1
kind: Secret
metadata:
name: agent-secrets
annotations:
argocd.argoproj.io/sync-wave: "0"
type: Opaque
data:
api_key: YWJjMTIz # base64 encoded

Output:

Wave 0 resources created and stabilized
- ConfigMap agent-config: Ready
- Secret agent-secrets: Ready

Example: Deployment in Wave 1

# App deployment waits for wave 0
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-api
annotations:
argocd.argoproj.io/sync-wave: "1"
spec:
replicas: 1
selector:
matchLabels:
app: agent-api
template:
metadata:
labels:
app: agent-api
spec:
containers:
- name: api
image: myregistry.azurecr.io/agent-api:v2.1.0
ports:
- containerPort: 8000
envFrom:
- configMapRef:
name: agent-config
- secretRef:
name: agent-secrets

Output:

Wave 0 complete. Proceeding to Wave 1.
- Deployment agent-api: Creating replicas (waiting for wave 0)
- Service agent-api: Created
Pod: agent-api-xxxxx ready

Resource Hooks: PreSync, PostSync, SyncFail

A resource hook is a Kubernetes Job or Pod that runs at specific points in the sync lifecycle. Unlike regular resources, hooks don't stay running—they execute and exit.

Hook Types and When to Use Them

Hook TypeWhenExample
PreSyncBefore any resources syncDatabase migrations, data preparation
SyncDuring normal resource creationCustom initialization (rarely used)
PostSyncAfter sync completes successfullyNotifications, smoke tests, warmup
SyncFailWhen sync failsAlerting, rollback notifications
PostDeleteWhen ArgoCD deletes resourcesCleanup, deregistration

Hook Deletion Policy

After a hook finishes, what happens to the Job/Pod?

argocd.argoproj.io/hook-deletion-policy: [HookSucceeded|HookFailed|BeforeHookCreation]
PolicyDeletes WhenUse Case
HookSucceededHook succeedsOne-time jobs (migrations)
HookFailedHook failsKeep failed jobs for debugging
BeforeHookCreationBefore hook runs (cleanup old ones)Prevent multiple job instances

PreSync Hooks: Running Database Migrations

A PreSync hook runs before any resources sync. This is where you run database migrations.

Complete Database Migration Example

# Service account for migration job
apiVersion: v1
kind: ServiceAccount
metadata:
name: db-migrate
namespace: agents
---
# ClusterRole for migration job (if needed)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: db-migrate
namespace: agents
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
---
# Bind role to service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: db-migrate
namespace: agents
subjects:
- kind: ServiceAccount
name: db-migrate
namespace: agents
roleRef:
kind: Role
name: db-migrate
apiGroup: rbac.authorization.k8s.io
---
# PreSync hook: Database migration job
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
namespace: agents
annotations:
# This is a hook, not a regular resource
argocd.argoproj.io/hook: PreSync
# Run before sync, then delete on success
argocd.argoproj.io/hook-deletion-policy: HookSucceeded
# Run in wave -1 to execute before everything else
argocd.argoproj.io/sync-wave: "-1"
spec:
backoffLimit: 1
template:
spec:
serviceAccountName: db-migrate
restartPolicy: Never
containers:
- name: migrate
image: myregistry.azurecr.io/agent-api:v2.1.0
command:
- sh
- -c
- |
# Wait for database to be ready (with timeout)
timeout 60 bash -c 'until python -c "import psycopg2; psycopg2.connect(host='$DB_HOST',user='$DB_USER',password='$DB_PASSWORD',dbname='$DB_NAME')" 2>/dev/null; do sleep 2; done'

# Run migrations with alembic
cd /app
pip install -q alembic psycopg2-binary
alembic upgrade head

# Verify migration succeeded
python -c "
import psycopg2
conn = psycopg2.connect(host='$DB_HOST',user='$DB_USER',password='$DB_PASSWORD',dbname='$DB_NAME')
cursor = conn.cursor()
cursor.execute(\"SELECT version FROM alembic_version;\")
version = cursor.fetchone()
print(f'✓ Migration completed. Version: {version[0]}')
cursor.close()
conn.close()
"
env:
- name: DB_HOST
valueFrom:
secretKeyRef:
name: db-secrets
key: host
- name: DB_USER
valueFrom:
secretKeyRef:
name: db-secrets
key: username
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secrets
key: password
- name: DB_NAME
valueFrom:
secretKeyRef:
name: db-secrets
key: database
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"

Output (on successful migration):

PreSync: Running database migration
Job: db-migration created
- Waiting for migration to complete...
- Migration command: alembic upgrade head
- ✓ Migration completed. Version: 5

Job deleted (HookSucceeded policy)
PreSync complete. Proceeding to sync.

Output (on migration failure):

PreSync: Running database migration
Job: db-migration created
- Waiting for migration to complete...
- ERROR: Cannot find column 'agent_id' in agents table
Job NOT deleted (job failed, waiting for manual investigation)
Sync FAILED - PreSync hook failed

PostSync Hooks: Notifications and Smoke Tests

A PostSync hook runs after sync completes successfully. Use this for:

  • Notifications (Slack, email, webhook)
  • Smoke tests (basic health checks)
  • Warming up caches
  • Database reseeding

Example: Slack Notification on Deployment

# Secret containing webhook URL
apiVersion: v1
kind: Secret
metadata:
name: slack-webhook
namespace: agents
type: Opaque
stringData:
webhook-url: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
---
# PostSync hook: Send Slack notification
apiVersion: batch/v1
kind: Job
metadata:
name: notify-deployment
namespace: agents
annotations:
argocd.argoproj.io/hook: PostSync
argocd.argoproj.io/hook-deletion-policy: HookSucceeded
argocd.argoproj.io/sync-wave: "2"
spec:
backoffLimit: 1
template:
spec:
restartPolicy: Never
containers:
- name: notify
image: curlimages/curl:latest
command:
- sh
- -c
- |
WEBHOOK_URL=$(cat /var/run/secrets/slack/webhook-url)
DEPLOYMENT_IMAGE=$(kubectl get deployment agent-api -n agents -o jsonpath='{.spec.template.spec.containers[0].image}')
REPLICA_READY=$(kubectl get deployment agent-api -n agents -o jsonpath='{.status.readyReplicas}')

curl -X POST "$WEBHOOK_URL" \
-H 'Content-Type: application/json' \
-d "{
\"text\": \"Agent API Deployment Complete\",
\"blocks\": [
{
\"type\": \"section\",
\"text\": {
\"type\": \"mrkdwn\",
\"text\": \"*Agent API Deployed Successfully*\n*Image:* \`$DEPLOYMENT_IMAGE\`\n*Ready Replicas:* $REPLICA_READY\"
}
}
]
}"
volumeMounts:
- name: slack-secret
mountPath: /var/run/secrets/slack
readOnly: true
resources:
requests:
memory: "64Mi"
cpu: "10m"
limits:
memory: "128Mi"
cpu: "100m"
volumes:
- name: slack-secret
secret:
secretName: slack-webhook
items:
- key: webhook-url
path: webhook-url

Output:

PostSync: Running notifications
Job: notify-deployment created
- Sending Slack notification...
- curl: Slack API responded with 200 OK
✓ Deployment notification sent

Job deleted (HookSucceeded policy)
Sync complete.

SyncFail Hooks: Alerting on Failure

A SyncFail hook runs when the sync process fails. Use this for alerting and rollback notifications.

apiVersion: batch/v1
kind: Job
metadata:
name: sync-failed-alert
namespace: agents
annotations:
argocd.argoproj.io/hook: SyncFail
argocd.argoproj.io/hook-deletion-policy: HookFailed
argocd.argoproj.io/sync-wave: "3"
spec:
backoffLimit: 0
template:
spec:
restartPolicy: Never
containers:
- name: alert
image: curlimages/curl:latest
command:
- sh
- -c
- |
WEBHOOK_URL=$(cat /var/run/secrets/slack/webhook-url)

curl -X POST "$WEBHOOK_URL" \
-H 'Content-Type: application/json' \
-d "{
\"text\": \"ALERT: Agent Deployment Failed\",
\"blocks\": [
{
\"type\": \"section\",
\"text\": {
\"type\": \"mrkdwn\",
\"text\": \":warning: *Deployment Failed*\nCheck ArgoCD UI for details. Cluster state may be inconsistent.\"
}
}
]
}"
volumeMounts:
- name: slack-secret
mountPath: /var/run/secrets/slack
readOnly: true
volumes:
- name: slack-secret
secret:
secretName: slack-webhook

Output (when sync fails):

Sync FAILED: Invalid image reference
SyncFail: Running alert hook
Job: sync-failed-alert created
- Sending failure alert to Slack...
- curl: Slack API responded with 200 OK
✓ Failure alert sent

Cluster is in UNKNOWN health state (check manually)

Hook Deletion Policies in Detail

HookSucceeded (Default for One-Time Jobs)

argocd.argoproj.io/hook-deletion-policy: HookSucceeded

Deletes the Job immediately after it succeeds. Use this for:

  • Database migrations
  • Data initialization
  • One-time setup

Example: Migration job should not persist after running.

HookFailed (Keep Failed Jobs for Debugging)

argocd.argoproj.io/hook-deletion-policy: HookFailed

Deletes the Job only if it fails. If it succeeds, the Job persists. Use this when:

  • You want to debug what happened after success
  • You want audit logs of hook execution

Example: Keep successful notification jobs for audit trail.

BeforeHookCreation (Cleanup Old Instances)

argocd.argoproj.io/hook-deletion-policy: BeforeHookCreation

Deletes the previous hook instance before creating a new one. Use this when:

  • Hook might run multiple times
  • You want only the latest instance around

Example: Notification hooks might run on each sync; delete the old one before creating a new one.

Complete Example: Multi-Wave Deployment with Hooks

Here's a realistic example combining waves and hooks:

# Wave -1: Prepare (migrations)
---
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
namespace: agents
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-deletion-policy: HookSucceeded
argocd.argoproj.io/sync-wave: "-1"
spec:
backoffLimit: 1
template:
spec:
restartPolicy: Never
containers:
- name: migrate
image: myregistry.azurecr.io/agent-api:v2.1.0
command: ["alembic", "upgrade", "head"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secrets
key: url
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"

# Wave 0: Infrastructure (config, secrets, volumes)
---
apiVersion: v1
kind: ConfigMap
metadata:
name: agent-config
annotations:
argocd.argoproj.io/sync-wave: "0"
data:
log_level: INFO
max_workers: "4"
---
apiVersion: v1
kind: Secret
metadata:
name: agent-secrets
annotations:
argocd.argoproj.io/sync-wave: "0"
type: Opaque
data:
api_key: YWJjMTIz

# Wave 1: Application
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-api
annotations:
argocd.argoproj.io/sync-wave: "1"
spec:
replicas: 2
selector:
matchLabels:
app: agent-api
template:
metadata:
labels:
app: agent-api
spec:
containers:
- name: api
image: myregistry.azurecr.io/agent-api:v2.1.0
ports:
- containerPort: 8000
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: agent-api
annotations:
argocd.argoproj.io/sync-wave: "1"
spec:
type: ClusterIP
ports:
- port: 8000
targetPort: 8000
selector:
app: agent-api

# Wave 2: Post-deployment (checks, notifications)
---
apiVersion: batch/v1
kind: Job
metadata:
name: smoke-tests
namespace: agents
annotations:
argocd.argoproj.io/hook: PostSync
argocd.argoproj.io/hook-deletion-policy: HookSucceeded
argocd.argoproj.io/sync-wave: "2"
spec:
backoffLimit: 1
template:
spec:
restartPolicy: Never
containers:
- name: test
image: curlimages/curl:latest
command:
- sh
- -c
- |
# Wait for service to be ready
sleep 5

# Test health endpoint
curl -f http://agent-api:8000/health || exit 1

# Test readiness endpoint
curl -f http://agent-api:8000/ready || exit 1

echo "✓ All smoke tests passed"
resources:
requests:
memory: "64Mi"
cpu: "10m"
limits:
memory: "128Mi"
cpu: "100m"

Output (complete sync with waves):

ArgoCD Sync Started

Wave -1: PreSync Preparation
Job: db-migration running
- Database migration in progress...
- ✓ Migration completed
Job: db-migration deleted (HookSucceeded)
Wave -1 complete ✓

Wave 0: Infrastructure
ConfigMap: agent-config created
Secret: agent-secrets created
Waiting for ConfigMap/Secret to stabilize...
Wave 0 stable ✓

Wave 1: Application
Deployment: agent-api created (requesting 2 replicas)
Service: agent-api created
Waiting for Deployment to stabilize...
Pod 1: agent-api-5f8c9b... Ready
Pod 2: agent-api-5f8c9b... Ready
Deployment stable (2/2 ready) ✓

Wave 2: Post-Sync Verification
Job: smoke-tests running
- Testing http://agent-api:8000/health
- Testing http://agent-api:8000/ready
- ✓ All smoke tests passed
Job: smoke-tests deleted (HookSucceeded)
Wave 2 complete ✓

Sync SUCCESSFUL
Application Status: Healthy

Troubleshooting Failed Hooks

When hooks fail, ArgoCD stops the sync. Here's how to diagnose:

Check Hook Job Logs

# List jobs in the namespace
kubectl get jobs -n agents

# View logs from failed migration
kubectl logs -n agents job/db-migration

# Describe the job to see event messages
kubectl describe job -n agents db-migration

Output:

NAME            COMPLETIONS   DURATION   AGE
db-migration 0/1 5s 5s

---

LOGS:
ERROR: Cannot connect to database at 192.168.1.10:5432
Connection refused - check DATABASE_URL environment variable

---

EVENTS:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 5s batch Created pod: db-migration-xxxxx
Normal Started 3s batch Started pod: db-migration-xxxxx
Warning Failed 1s batch Pod failed (exit code 1)

Common Issues

IssueCauseFix
Hook never runsWave number too highCheck sync-wave annotation; PreSync should be -1
Hook runs but failsMissing environment variablesVerify valueFrom references exist (secrets, configmaps)
Job persists when it shouldn'tWrong deletion policyChange to HookSucceeded for one-time jobs
Multiple jobs runningBeforeHookCreation not setAdd deletion policy to cleanup old instances
Sync proceeds despite hook failureHook is not a PreSyncChange argocd.argoproj.io/hook: PreSync for critical jobs

Key Concepts Review

Sync Waves:

  • Annotate resources with argocd.argoproj.io/sync-wave: "N"
  • ArgoCD creates waves in ascending order (-1, 0, 1, 2, ...)
  • Waits for each wave to stabilize before proceeding

Resource Hooks:

  • Execute at sync lifecycle points: PreSync, Sync, PostSync, SyncFail, PostDelete
  • Not regular resources—they run and exit
  • Use argocd.argoproj.io/hook: [HookType] annotation

Hook Deletion Policies:

  • HookSucceeded: Delete on success (one-time jobs like migrations)
  • HookFailed: Delete on failure (keep successful runs for audit)
  • BeforeHookCreation: Delete previous instance before creating new one

Wave -1 PreSync Hooks:

  • Run before any resources sync
  • Perfect for database migrations
  • Ensure prerequisite data exists before application starts

Wave 2+ PostSync Hooks:

  • Run after all resources are healthy
  • Use for notifications, smoke tests, warmup
  • Verify deployment success before marking complete

Try With AI

Ask Claude: "I need my PostgreSQL database migration to run before my FastAPI agent deploys. Create a PreSync hook Job that runs alembic upgrade head. The database connection details are in a secret named 'db-secrets' with keys 'host', 'username', 'password', and 'database'."

Review Claude's response for these key aspects:

  • Does the Job use the argocd.argoproj.io/hook: PreSync annotation?
  • Does it set argocd.argoproj.io/sync-wave: "-1" to run before everything?
  • Does it use argocd.argoproj.io/hook-deletion-policy: HookSucceeded?
  • Does it reference the secret using valueFrom?
  • Does it include error handling (timeout for database readiness)?
  • Does it have proper resource requests and limits?

Iterate with Claude: "Now add a PostSync hook that sends a Slack notification when the deployment succeeds. The hook should include the container image tag that was deployed. Reference the Slack webhook URL from a secret named 'slack-webhook' with key 'url'."

Verify the PostSync hook:

  • Uses argocd.argoproj.io/hook: PostSync annotation?
  • Has argocd.argoproj.io/sync-wave: "2" to run after the app starts?
  • Retrieves the deployed image using kubectl get deployment?
  • Formats a proper Slack message payload?
  • Handles the webhook URL securely from the secret?

One more refinement: "If the migration fails, I want the hook job to stay in the cluster so I can debug why. Update the deletion policy appropriately."

Check if Claude:

  • Changed deletion policy to NOT delete failed jobs?
  • OR left migration as HookSucceeded but added a separate debug/audit hook?
  • Explained the tradeoff between cleanup and debugging?