Health Checks: Liveness, Readiness, Startup Probes

Your agent is now running in a Deployment (Lesson 4). Kubernetes monitors it and restarts it if the container crashes. But there's a critical problem: a container that's running might not be healthy.

Imagine this: Your agent's model loads asynchronously. For the first 30 seconds after startup, the container is running but can't handle requests. Kubernetes sees it as "ready" and sends traffic to it immediately. Users get errors. The container didn't crash, so Kubernetes doesn't restart it. Your service is degraded but technically "up."

This lesson teaches health checks—the mechanism that lets Kubernetes know whether your container is actually ready to serve traffic, whether it's still alive, and how long to wait during startup before expecting it to respond.

The Problem: Running ≠ Ready

Consider your AI agent startup sequence:

0s   - Container starts, main process begins
0.5s - Python initializes, imports libraries
5s   - Model weights load into memory
10s  - Embedding vectors cache builds
30s  - Application ready to serve requests

Without health checks:

Kubernetes sends traffic at 1s (while model is still loading)
Requests fail, users see errors
Container doesn't crash (exit code 0 is still valid)
Kubernetes doesn't restart it
You have a running but broken service

With health checks:

Readiness probe fails for first 30 seconds (container is running but not ready)
Kubernetes removes pod from service endpoints
Traffic doesn't reach it until model is loaded
Once ready, traffic flows
If it becomes unhealthy, Kubernetes routes around it or restarts it

Three Types of Probes

Kubernetes provides three health check mechanisms:

1. Readiness Probe (Is this pod ready to serve traffic?)

Purpose: Determines if a pod should receive traffic from Services.

When to use:

Application startup is slow (models loading, caches building)
Container is running but dependencies aren't ready
Need to temporarily remove pod from load balancer during updates

Behavior:

Fails → Pod removed from Service endpoints (traffic stops)
Recovers → Pod re-added to Service endpoints (traffic resumes)
Success → Pod receives traffic normally

Key insight: Readiness probes are about external traffic. Even a healthy, running pod might not be ready to serve external requests.

2. Liveness Probe (Is this pod still alive?)

Purpose: Detects if container is in a broken/stuck state and needs restart.

When to use:

Detect deadlocks or infinite loops
Identify containers that are running but unresponsive
Force restart of unhealthy containers (different from crash)

Behavior:

Fails continuously → Kubernetes restarts pod
Success → No action, pod continues running

Key insight: Liveness probes are about pod lifecycle. A pod can be running but logically dead (stuck waiting, infinite loop, memory leak).

3. Startup Probe (Has this pod finished initializing?)

Purpose: Prevents liveness/readiness probes from triggering during slow startup.

When to use:

Container has long initialization time (30+ seconds)
Need different probe behavior during startup vs steady state
AI agents loading models, warming caches

Behavior:

Fails during startup → No restart (give it more time)
Succeeds → Liveness/readiness probes take over
Fails after startup complete → Kubernetes restarts pod (treated as liveness failure)

Key insight: Startup probes buy time for initialization. Once startup succeeds, other probes begin their work.

HTTP GET Probes (Most Common)

HTTP probes call a health endpoint and check the response code.

Basic HTTP Probe Structure

spec:
  containers:
  - name: agent
    image: my-agent:v1
    ports:
    - containerPort: 8000
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 8000
      initialDelaySeconds: 10
      periodSeconds: 5
      timeoutSeconds: 1
      failureThreshold: 3

Explanation:

httpGet.path: Endpoint to call
httpGet.port: Container port (can be number or name)
initialDelaySeconds: Wait 10s before first probe (let container start)
periodSeconds: Check every 5 seconds
timeoutSeconds: If response takes >1s, count as failure
failureThreshold: After 3 failures, take action (remove from service or restart)

HTTP Readiness Probe Example

Your FastAPI agent needs a readiness endpoint that returns 200 only when model is loaded:

# main.py - FastAPI agent with readiness endpoint
from fastapi import FastAPI
from fastapi.responses import JSONResponse
import asyncio
import time

app = FastAPI()

# Simulate model loading time
model_loaded = False
load_start = None

@app.on_event("startup")
async def load_model():
    global model_loaded, load_start
    load_start = time.time()
    print("Starting model load...")
    # Simulate 15s model loading
    await asyncio.sleep(15)
    model_loaded = True
    load_time = time.time() - load_start
    print(f"Model loaded in {load_time:.1f}s")

@app.get("/health/ready")
async def readiness():
    """Returns 200 only when model is fully loaded"""
    if not model_loaded:
        return JSONResponse(
            {"status": "loading", "ready": False},
            status_code=503
        )
    return JSONResponse({"status": "ready", "ready": True})

@app.get("/health/live")
async def liveness():
    """Always returns 200 if main process is running"""
    return JSONResponse({"status": "alive"})

@app.post("/predict")
async def predict(data: dict):
    """Agent inference endpoint"""
    if not model_loaded:
        return JSONResponse(
            {"error": "Model still loading, try again soon"},
            status_code=503
        )
    # Do actual inference
    return {"prediction": "example output"}

Output (when you test these endpoints):

Starting model load...

# First request during loading
curl http://localhost:8000/health/ready

Output:

{"status": "loading", "ready": false}

# After 15 seconds, try again
curl http://localhost:8000/health/ready

Output:

{"status": "ready", "ready": true}

# Liveness endpoint always responds
curl http://localhost:8000/health/live

Output:

{"status": "alive"}

Deployment with HTTP Readiness and Liveness Probes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: agent
  template:
    metadata:
      labels:
        app: agent
    spec:
      containers:
      - name: agent
        image: my-agent:v1
        ports:
        - containerPort: 8000
          name: http

        # Wait for model to load before sending traffic
        readinessProbe:
          httpGet:
            path: /health/ready
            port: http
          initialDelaySeconds: 5    # Check after 5s (model loads in 15s)
          periodSeconds: 5          # Check every 5s
          timeoutSeconds: 2         # Request must complete in 2s
          failureThreshold: 2       # 2 failures = not ready (10s total)

        # Detect if container is stuck or crashed
        livenessProbe:
          httpGet:
            path: /health/live
            port: http
          initialDelaySeconds: 20   # Wait for startup before checking
          periodSeconds: 10         # Check every 10s (less frequent)
          timeoutSeconds: 2
          failureThreshold: 3       # 3 failures = restart pod (30s total)

        # Buy time for initialization before liveness checks
        startupProbe:
          httpGet:
            path: /health/ready
            port: http
          periodSeconds: 5          # Check every 5s during startup
          failureThreshold: 6       # Allow 30s for startup (6 * 5s)

What happens when you deploy this:

kubectl apply -f deployment.yaml

Output:

deployment.apps/agent-deployment created

# Watch pod startup
kubectl get pods -w

NAME                                 READY   STATUS    RESTARTS
agent-deployment-7d4f5c6b9f-abc123   0/1     Running   0
agent-deployment-7d4f5c6b9f-xyz789   0/1     Running   0

# Startup probe running (checking every 5s)
# At 5s: startup probe fails (model still loading)
# At 10s: startup probe fails
# At 15s: startup probe succeeds (model loaded)
#
# Now readiness and liveness probes take over

# After ~20s:
agent-deployment-7d4f5c6b9f-abc123   1/1     Running   0
agent-deployment-7d4f5c6b9f-xyz789   1/1     Running   0

# View probe events
kubectl describe pod agent-deployment-7d4f5c6b9f-abc123

# Output excerpt:
# Events:
#   Type     Reason     Age   From               Message
#   ----     ------     ---   ----               -------
#   Normal   Created    45s   kubelet            Created container agent
#   Normal   Started    45s   kubelet            Started container agent
#   Warning  Unhealthy  40s   kubelet            Readiness probe failed: HTTP probe failed
#   Warning  Unhealthy  35s   kubelet            Readiness probe failed: HTTP probe failed
#   Normal   Ready      30s   kubelet            Container is ready

TCP Socket Probes (Port Availability)

Use TCP probes when you don't have an HTTP endpoint, or for protocols like gRPC/database connections.

TCP Probe Configuration

spec:
  containers:
  - name: cache
    image: redis:7
    ports:
    - containerPort: 6379

    livenessProbe:
      tcpSocket:
        port: 6379
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 1
      failureThreshold: 3

How it works:

Kubernetes attempts TCP connection to port
Success → Port is open and accepting connections
Timeout/failure → Port not responding

Example: Redis cache in your cluster

# Redis container starts but is slow to open port
kubectl logs deployment/redis-deployment

# Output:
Ready to accept connections

kubectl describe shows probe activity:

kubectl describe pod redis-deployment-xxxxx

# Output excerpt:
# Conditions:
#   Ready          True
#   ContainersReady True
# Events:
#   Type    Reason     Age  From     Message
#   ----    ------     ---  ----     -------
#   Normal  Created    1m   kubelet  Created container cache
#   Normal  Started    1m   kubelet  Started container cache
#   Normal  Healthy    55s  kubelet  Tcp-socket probe succeeded

Exec Probes (Custom Commands)

Run arbitrary shell commands for custom health checks.

Exec Probe Example

Your agent needs a complex health check:

spec:
  containers:
  - name: agent
    image: my-agent:v1

    readinessProbe:
      exec:
        command:
        - /bin/sh
        - -c
        - |
          # Check if critical files exist and are recent
          [ -f /app/model.pkl ] && \
          [ $(find /app/model.pkl -mmin -2) ] && \
          curl -sf http://localhost:8000/health/ready > /dev/null
      initialDelaySeconds: 5
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 2

Explanation:

exec.command: Run shell commands
First part: Verify model file exists and was modified in last 2 minutes
Second part: Verify HTTP endpoint responds
Exit code 0 = healthy, non-zero = unhealthy

Example with Database Check

livenessProbe:
  exec:
    command:
    - python
    - -c
    - |
      import sqlite3
      conn = sqlite3.connect('/data/cache.db')
      cursor = conn.cursor()
      cursor.execute('SELECT 1')
      cursor.close()
      print('Database healthy')

Output when you check the pod:

kubectl logs agent-deployment-xxxxx -c agent

# Output:
Database healthy
Database healthy
Database healthy

# If it fails:
kubectl describe pod agent-deployment-xxxxx

# Output excerpt:
# Events:
#   Type     Reason     Age   Message
#   ----     ------     ---   -------
#   Warning  Unhealthy  30s   Exec probe failed: Database connection failed

Timing Parameters (Critical Decisions)

Probe timing parameters dramatically affect pod lifecycle. Choosing wrong values causes cascading failures.

initialDelaySeconds: Wait Before First Check

Purpose: Give container time to start before first health check.

Decision framework:

Fast startup (web server): 5-10s
Moderate startup (cache warming): 15-30s
Slow startup (model loading): 30-60s
Very slow startup (fine-tuning, large models): Use startupProbe instead

Wrong value too low (5s for 15s load time):

0s   Container starts
5s   First readiness probe → FAILS (model still loading)
15s  Model finished loading
20s  Readiness probe finally succeeds
→ Pod unavailable for 15s unnecessarily

Right value (25s for 15s load time):

0s   Container starts
5s   Startup probe checking
15s  Model loaded
20s  Startup probe succeeds
25s  First readiness check (succeeds)
→ Cleaner transition

periodSeconds: How Often to Check

Purpose: Frequency of health checks after first check.

Decision framework:

Readiness probe: 5-10s (frequent, pod removal is gentle)
Liveness probe: 10-30s (less frequent, restart is drastic)
Startup probe: 5s (frequent during init, then liveness takes over)

Why different frequencies?

Readiness: Removing pod from service is non-destructive
Liveness: Restarting pod loses in-flight requests

timeoutSeconds: Wait for Response

Purpose: How long before probe response is considered failure.

Decision framework:

Healthy endpoint: 1-2s
Slow endpoint: 5s
Database query: 5-10s

Watch out: If timeout >= period, you get overlapping probes.

Wrong: periodSeconds: 5, timeoutSeconds: 10
→ Probe takes 10s to timeout, but next probe starts at 5s
→ Multiple overlapping probes, confusion

Right: periodSeconds: 5, timeoutSeconds: 2
→ Probe completes in 2s, next starts at 5s, clean

failureThreshold: How Many Failures Until Action

Purpose: Prevent flaky endpoints from triggering cascading restarts.

Decision framework:

Readiness probe: 2-3 (quicker reaction to real issues)
Liveness probe: 3-5 (more tolerant, restart is drastic)
Startup probe: 6-10 (allow slow initialization)

Math example:

readiness: periodSeconds=5, failureThreshold=3
→ 3 failed checks * 5s = 15s total before pod removed

liveness: periodSeconds=10, failureThreshold=3
→ 3 failed checks * 10s = 30s total before restart

startup: periodSeconds=5, failureThreshold=6
→ 6 failed checks * 5s = 30s allowed for initialization

Debugging Probe Failures

When pods aren't becoming ready or are restarting, probes are usually the culprit.

Step 1: Check Pod Status

kubectl get pods

Output:

NAME                                 READY   STATUS    RESTARTS
agent-deployment-7d4f5c6b9f-abc123   0/1     Running   0

Interpretation:

READY 0/1 = Pod running but not ready (readiness probe failing)
RESTARTS increasing = Liveness probe killing pod

Step 2: Describe Pod for Events

kubectl describe pod agent-deployment-7d4f5c6b9f-abc123

Output (showing readiness probe failure):

Name:           agent-deployment-7d4f5c6b9f-abc123
Namespace:      default
Status:         Running
Containers:
  agent:
    Image:      my-agent:v1
    State:      Running
      Started:  Sat, 22 Dec 2024 15:42:30 UTC
    Ready:      False
    Restart Count: 0
    Readiness:  exec Exec probe failed

Events:
  Type     Reason     Age    From               Message
  ----     ------     ---    ----               -------
  Normal   Created    2m18s  kubelet            Created container agent
  Normal   Started    2m18s  kubelet            Started container agent
  Warning  Unhealthy  2m13s  kubelet            Readiness probe failed: exec probe failed
  Warning  Unhealthy  2m08s  kubelet            Readiness probe failed: exec probe failed
  Warning  Unhealthy  2m03s  kubelet            Readiness probe failed: exec probe failed

Key info:

Ready: False = Readiness probe failing
Readiness: exec Exec probe failed = Which probe is failing
Events show exact failure messages

Step 3: Check Container Logs

kubectl logs agent-deployment-7d4f5c6b9f-abc123

# Output:
Starting model load...
Model loaded in 15.2s
WARNING: POST /predict called before startup complete

Diagnosis: Model takes 15s to load, but readiness probe starts checking at 5s.

Step 4: Test Probe Manually

Port forward to test endpoint directly:

kubectl port-forward pod/agent-deployment-7d4f5c6b9f-abc123 8000:8000

Output:

Forwarding from 127.0.0.1:8000 -> 8000
Forwarding from [::1]:8000 -> 8000

In another terminal, test the readiness endpoint:

curl http://localhost:8000/health/ready

Output (if model is still loading):

{"error": "Model still loading"}

Output (if model is ready):

{"status": "ready", "ready": true}

Step 5: Fix and Retest

Common fixes:

# Problem: initialDelaySeconds too low
# Fix: Increase delay
readinessProbe:
  httpGet:
    path: /health/ready
    port: 8000
  initialDelaySeconds: 30  # Was 5, now 30

# Problem: Endpoint too slow
# Fix: Increase timeout
  timeoutSeconds: 5  # Was 1

# Problem: Flaky endpoint
# Fix: Increase failureThreshold
  failureThreshold: 3  # Was 1

Apply updated deployment:

kubectl apply -f deployment.yaml

# Watch pods restart with new config
kubectl get pods -w

NAME                                 READY   STATUS     RESTARTS
agent-deployment-7d4f5c6b9f-abc123   0/1     Pending    0
agent-deployment-7d4f5c6b9f-abc123   0/1     Running    0
agent-deployment-7d4f5c6b9f-abc123   1/1     Running    0

# Now ready!

AI-Native Health Checks

For AI agents with variable startup times and complex dependencies, design probes thoughtfully.

Pattern: Startup Probe + Readiness Probe + Liveness Probe

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-agent
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: agent
        image: my-agent:latest

        # Buy time for model initialization
        startupProbe:
          httpGet:
            path: /health/startup
            port: 8000
          periodSeconds: 5
          failureThreshold: 30  # 150s total (5 * 30)

        # Check if model is loaded and ready
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 2
          failureThreshold: 2

        # Detect if model inference is broken
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 2
          failureThreshold: 3

Probe Implementation Strategy

# health.py - Health check endpoints for AI agent
from fastapi import FastAPI
from datetime import datetime
import os

# Global state
model = None
model_loaded_at = None
last_inference_at = None
last_inference_error = None

@app.on_event("startup")
async def initialize_agent():
    global model, model_loaded_at, last_inference_at
    print("Loading model...")
    # Simulate loading process
    await asyncio.sleep(30)  # Real: model.load(), cache.warm()
    model = True
    model_loaded_at = datetime.now()
    last_inference_at = datetime.now()

@app.get("/health/startup")
async def startup():
    """Startup probe: Is initialization complete?"""
    if model is None:
        return JSONResponse({"status": "initializing"}, status_code=503)
    return JSONResponse({"status": "started"})

@app.get("/health/ready")
async def readiness():
    """Readiness probe: Is this pod ready for traffic?"""
    if model is None:
        return JSONResponse({"status": "not_ready"}, status_code=503)

    # Optional: Check dependencies
    if not check_cache_healthy():
        return JSONResponse({"status": "cache_unhealthy"}, status_code=503)

    return JSONResponse(
        {"status": "ready", "model_loaded_at": model_loaded_at.isoformat()}
    )

@app.get("/health/live")
async def liveness():
    """Liveness probe: Is inference still working?"""
    if model is None:
        return JSONResponse({"status": "not_initialized"}, status_code=503)

    # Check for stuck state (no inference in 5 minutes)
    if (datetime.now() - last_inference_at).total_seconds() > 300:
        return JSONResponse(
            {"status": "stale", "age_seconds": 300},
            status_code=503
        )

    # Check for recent errors
    if last_inference_error:
        return JSONResponse(
            {"status": "error", "last_error": last_inference_error},
            status_code=503
        )

    return JSONResponse({"status": "healthy"})

@app.post("/predict")
async def predict(data: dict):
    global last_inference_at, last_inference_error

    try:
        if model is None:
            raise RuntimeError("Model not loaded")

        result = run_inference(data)
        last_inference_at = datetime.now()
        last_inference_error = None
        return result
    except Exception as e:
        last_inference_error = str(e)
        return JSONResponse({"error": str(e)}, status_code=500)

Try With AI

Now you've learned to configure probes manually. In the next lesson, you'll use kubectl-ai to generate probe configurations automatically and debug real cluster issues with AI guidance.

Setup: You have a containerized agent with a health endpoint.

Challenge: Configure probes for an agent with these characteristics:

Model loads in 20 seconds
Health endpoint responds in under 500ms when healthy
You want aggressive failure detection for readiness (pod removed quickly if unhealthy)
You want conservative failure detection for liveness (don't restart on transient errors)

Action Prompt: Write a Deployment manifest with startup, readiness, and liveness probes. Consider:

How long should initialDelaySeconds be?
How frequent should readiness checks be (periodSeconds)?
How many failures before removing pod from service (failureThreshold)?
What should failureThreshold be for liveness (more conservative)?

Test your configuration by deploying to Minikube and simulating failures (make your health endpoint return 500 temporarily to test probe behavior).

The Problem: Running ≠ Ready​

Three Types of Probes​

1. Readiness Probe (Is this pod ready to serve traffic?)​

2. Liveness Probe (Is this pod still alive?)​

3. Startup Probe (Has this pod finished initializing?)​

HTTP GET Probes (Most Common)​

Basic HTTP Probe Structure​

HTTP Readiness Probe Example​

Deployment with HTTP Readiness and Liveness Probes​

TCP Socket Probes (Port Availability)​

TCP Probe Configuration​

Exec Probes (Custom Commands)​

Exec Probe Example​

Example with Database Check​

Timing Parameters (Critical Decisions)​

initialDelaySeconds: Wait Before First Check​

periodSeconds: How Often to Check​

timeoutSeconds: Wait for Response​

failureThreshold: How Many Failures Until Action​

Debugging Probe Failures​

Step 1: Check Pod Status​

Step 2: Describe Pod for Events​

Step 3: Check Container Logs​

Step 4: Test Probe Manually​

Step 5: Fix and Retest​

AI-Native Health Checks​

Pattern: Startup Probe + Readiness Probe + Liveness Probe​

Probe Implementation Strategy​

Try With AI​

The Problem: Running ≠ Ready

Three Types of Probes

1. Readiness Probe (Is this pod ready to serve traffic?)

2. Liveness Probe (Is this pod still alive?)

3. Startup Probe (Has this pod finished initializing?)

HTTP GET Probes (Most Common)

Basic HTTP Probe Structure

HTTP Readiness Probe Example

Deployment with HTTP Readiness and Liveness Probes

TCP Socket Probes (Port Availability)

TCP Probe Configuration

Exec Probes (Custom Commands)

Exec Probe Example

Example with Database Check

Timing Parameters (Critical Decisions)

initialDelaySeconds: Wait Before First Check

periodSeconds: How Often to Check

timeoutSeconds: Wait for Response

failureThreshold: How Many Failures Until Action

Debugging Probe Failures

Step 1: Check Pod Status

Step 2: Describe Pod for Events

Step 3: Check Container Logs

Step 4: Test Probe Manually

Step 5: Fix and Retest

AI-Native Health Checks

Pattern: Startup Probe + Readiness Probe + Liveness Probe

Probe Implementation Strategy

Try With AI