Skip to main content

Chapter 50: Kubernetes for AI Services

Docker gives you portable containers. Kubernetes orchestrates them—handling deployment, scaling, networking, and self-healing automatically. When your agent container crashes, Kubernetes restarts it. When traffic spikes, Kubernetes scales it. When you push a new version, Kubernetes rolls it out safely.

This chapter teaches Kubernetes comprehensively, covering everything from core primitives to production-ready patterns for AI services. You'll learn the declarative model that makes Kubernetes powerful, understand how the control plane manages desired state, and master the kubectl commands that make operations automated and repeatable.

By the end, your containerized FastAPI agent from Chapter 49 will be running on a Kubernetes cluster with proper health checks, resource management, security policies, and automated scaling—all without manual intervention.

What You'll Learn

By the end of this chapter, you'll be able to:

  • Understand Kubernetes architecture: Control plane, worker nodes, and the declarative model
  • Deploy containers on Kubernetes: Pods, Deployments, ReplicaSets, Jobs, CronJobs, and lifecycle management
  • Use advanced Pod patterns: Init containers for setup, sidecars for logging and monitoring
  • Organize workloads: Namespaces for isolation, resource quotas for multi-tenancy
  • Expose services: ClusterIP, NodePort, LoadBalancer, Ingress for HTTP routing
  • Configure applications: ConfigMaps for configuration, Secrets for sensitive data
  • Persist data: Persistent Volumes, Claims, StorageClasses, StatefulSets for stateful workloads
  • Manage resources: CPU/memory requests and limits, Horizontal Pod Autoscaler
  • Secure deployments: RBAC, SecurityContext, NetworkPolicy, Pod Security Standards
  • Monitor health: Liveness, readiness, and startup probes
  • Use kubectl-ai: AI-assisted manifest generation and cluster operations
  • Package with Helm: Charts, templates, and release management
  • Deploy agents at scale: Rolling updates, horizontal scaling, self-healing patterns

Chapter Structure

LessonTitleLayerFocus
1Kubernetes Architecture & the Declarative ModelL1Control plane, workers, desired vs observed state
2Setting Up MinikubeL1Installation, cluster creation, kubectl context
3Pods: The Atomic UnitL1Pod anatomy, YAML, lifecycle, multi-container
4Deployments: Self-Healing at ScaleL1ReplicaSets, rolling updates, rollbacks
5Services & NetworkingL1ClusterIP, NodePort, LoadBalancer, DNS
6Init Containers: Preparing Your EnvironmentL1Initialization patterns, dependency setup
7Sidecar Containers: Your Agent's Best FriendL1Native sidecars (K8s 1.28+), logging, metrics
8Namespaces: Virtual ClustersL1Isolation, ResourceQuotas, LimitRanges
9Ingress: Exposing Your Agent to the WorldL1Path/host routing, TLS, annotations
10Service Discovery Deep DiveL1CoreDNS, FQDN, headless services
11ConfigMaps & SecretsL1Configuration injection, security notes
12Persistent Storage: PV and PVCL1Storage lifecycle, access modes, StorageClass
13StatefulSets: When Your Agent Needs IdentityL1Stable identity, volumeClaimTemplates
14Resource Management & DebuggingL1Requests/limits, QoS, kubectl debug
15Horizontal Pod AutoscalerL1Metrics-server, CPU/memory scaling, behavior
16RBAC: Securing Agent DeploymentsL1ServiceAccount, Role, RoleBinding
17Kubernetes Security for AI ServicesL1SecurityContext, NetworkPolicy, Pod Security
18Health Checks & ProbesL1Liveness, readiness, startup probes
19Jobs & CronJobs: Batch WorkloadsL1One-time tasks, scheduled jobs, parallelism
20AI-Assisted K8s with kubectl-aiL2Natural language to manifests, debugging
21Helm Charts for AI Agent PackagingL1Charts, templates, releases
22Capstone: Deploy Your Agent to KubernetesL4Spec-driven deployment, full validation
23Building the Kubernetes Deployment SkillL3Persona + Questions + Principles

4-Layer Teaching Progression

This chapter follows the 4-Layer Teaching Method:

  • Lessons 1-19, 21 (Layer 1): Build mental models of Kubernetes concepts manually before AI assistance. This includes core primitives (Pods, Deployments, Services), advanced patterns (init containers, sidecars, StatefulSets), networking (Ingress, service discovery), storage, security (RBAC, NetworkPolicy), autoscaling, batch workloads (Jobs/CronJobs), and Helm packaging.

  • Lesson 20 (Layer 2): Collaborate with kubectl-ai using Three Roles to translate natural language into cluster operations. By this point, you have deep Kubernetes knowledge to evaluate AI-generated manifests critically.

  • Lesson 22 (Layer 4): Apply all lessons in a spec-driven capstone project. Deploy your Part 6 FastAPI agent with proper configuration, security, health checks, and resource management.

  • Lesson 23 (Layer 3): Create reusable intelligence—a Kubernetes deployment skill that compounds across cloud-native projects.

Prerequisites

  • Chapter 49 completion: A containerized FastAPI agent pushed to a registry—this is the container you'll deploy to Kubernetes
  • Docker familiarity: You should understand images, containers, and how to run containers locally
  • Basic command-line comfort: You should be comfortable running commands in a terminal and navigating file systems
  • No Kubernetes experience required: Lesson 1 explains cluster architecture and the declarative model from scratch

Your Part 6 Agent: The Thread Through This Chapter

Throughout this chapter, we deploy your Part 6 FastAPI agent to Kubernetes:

  • Lessons 1-19: Learn Kubernetes concepts using your agent as the running example—from basic Pod deployment to production-ready configurations with security, autoscaling, and health checks
  • Lesson 20: Use kubectl-ai to generate deployment manifests and troubleshoot cluster issues collaboratively
  • Lesson 21: Package your agent as a Helm chart for repeatable, environment-specific deployments
  • Lesson 22 (Capstone): Deploy your containerized Part 6 agent to a Kubernetes cluster with all production patterns applied
  • Lesson 23: Create a reusable skill for future Kubernetes deployment work

By the end, your agent runs on an orchestrated cluster with automatic scaling, self-healing, proper security, and rolling updates.

Looking Ahead

This chapter produces a deployed agent and a reusable Kubernetes deployment skill. Chapter 51 (Kafka for AI Events) builds on this foundation with event-driven architectures for AI agent communication.