Skip to main content

AI-Assisted Chart Development

Throughout this chapter, you've built sophisticated Helm charts through manual construction—writing templates, structuring values, composing dependencies, and testing validation. You've gained the mental models and syntax mastery that enable production-grade design. You understand WHY certain patterns matter and HOW to implement them correctly.

This lesson introduces a powerful collaboration technique: using AI to accelerate chart development while maintaining your judgment as the quality gatekeeper.

The key insight: AI generates initial structures quickly and suggests patterns you might not have considered. Your role is to evaluate these suggestions critically, provide domain constraints that AI lacks, and iterate toward solutions that balance sophistication with maintainability. Neither of you alone produces the optimal result—together, through structured dialogue, you discover approaches superior to either starting point.


The Collaboration Pattern: Three-Phase Development

Chart development with AI follows a predictable pattern repeated across five real scenarios below:

  1. Initial Request: You describe what you're building
  2. Critical Evaluation: You review AI's response against your actual requirements
  3. Constraint Teaching: You teach AI about your domain (security policies, resource constraints, team practices)
  4. Refinement: AI adapts based on your constraints
  5. Validation: You verify the result matches your intent

This pattern eliminates the "perfect first try" myth. Production charts evolve through dialogue, not magic prompts.


Scenario 9.1: Complex Ingress Configuration with AI Assistance

Challenge: Multi-Path Routing

You're deploying a language model API that needs:

  • /api/inference → inference service
  • /api/admin → admin service (with authentication)
  • /health → health check endpoint
  • TLS termination
  • Rate limiting on inference endpoint

Multiple valid approaches exist. Should you use Ingress rules or separate services? How do you handle different authentication per path? Rate limiting in Ingress or in-cluster?

Initial Request

Ask AI:

I'm building a Helm chart for a language model API with these requirements:
- Routes /api/inference to my inference service
- Routes /api/admin to my admin service (needs auth)
- Routes /health to a health check endpoint
- TLS termination
- Rate limiting on the inference endpoint (100 requests/minute)

What Ingress configuration would you recommend? Should I use multiple Ingress objects or one with multiple paths?

Evaluation: What Matches vs What's Missing

Review AI's response. Ask yourself:

  • Does the routing structure match your services? (If AI suggested a path you don't have, that's a red flag)
  • How does AI handle authentication? (Look for ServiceAccount assumptions, NetworkPolicy mentions, or basic auth)
  • Where does it place rate limiting? (In Ingress? Via middleware? Depends on your ingress-controller)
  • What assumptions did AI make about your cluster? (Nginx-ingress? HAProxy? Cloud load balancer?)
  • What operational complexity does this introduce? (Extra RBAC roles? Init scripts? Webhook configuration?)

Constraint Teaching

Based on your evaluation, teach AI your specific constraints:

I'm using Nginx-ingress controller on Kubernetes 1.24+. The admin service
requires OIDC authentication that's already configured at the cluster level.
For rate limiting, I prefer ingress-level (not in-app) to protect the service.

My team wants to minimize complexity—we don't want webhook configurations or
custom middleware. How do I implement this within Nginx's native capabilities?
Also, how should I structure the values.yaml so operators can adjust these
settings without editing the Ingress template?

Refinement: AI Adapts

AI now understands:

  • Your ingress controller (Nginx, not generic)
  • Your authentication infrastructure (OIDC already configured)
  • Your operational preference (native ingress features, not webhooks)
  • Your templating expectation (values-driven configuration)

The updated response addresses these constraints—using Nginx-specific annotations for rate limiting, removing webhook assumptions, and showing clear values hierarchy.

Final Validation

Compare initial vs refined response:

# Initial: Generic approach
metadata:
annotations:
nginx.ingress.kubernetes.io/rate-limit: "100"
custom-auth: middleware-webhook
# Refined: Nginx-specific with constraints
metadata:
annotations:
nginx.ingress.kubernetes.io/limit-rps: "{{ .Values.rateLimit }}"
nginx.ingress.kubernetes.io/auth-type: "oidc"
nginx.ingress.kubernetes.io/auth-url: "{{ .Values.oidcUrl }}"

Ask yourself:

  • Is the refined version more aligned with your cluster's actual capabilities?
  • Can your operators use this without understanding webhook architecture?
  • Does the values structure make sense for your environment patterns?

If yes to all three → The iteration converged successfully.


Scenario 9.2: Multi-Tier Resource Limits Through AI Dialogue

Challenge: Environment-Specific Resources

You need to deploy your chart to three environments with different resource profiles:

  • Development: Minimal resources (0.1 CPU, 256Mi RAM)
  • Staging: Testing profile (0.5 CPU, 512Mi RAM, stricter probes)
  • Production: High-availability (2 CPU, 4Gi RAM, aggressive scaling)

How do you structure values.yaml so changing environments doesn't require manual template editing? Should resource limits be environment-specific or per-component?

Initial Request

I need to deploy a Helm chart across dev, staging, and production with
different resource allocations. Development needs minimal resources (0.1 CPU, 256Mi),
staging needs 0.5 CPU and 512Mi, and production needs 2 CPU and 4Gi.

How should I structure values.yaml to make these environment profiles easy
to manage? Should I use one values file per environment, or layer them?

Evaluation: Practical vs Theoretical

AI will likely suggest:

  • Values files per environment (values-dev.yaml, values-prod.yaml)
  • Resource definitions parameterized by tier
  • Possible use of Helm subcharts or includes

Ask yourself:

  • How would your team actually use this? (Is switching environments a helm install -f values-prod.yaml or more complex?)
  • What about drift? (If someone manually edits resources in-cluster, how do you know what the chart defined?)
  • Operational maintenance: Which approach requires less documentation?
  • Does this scale to 10+ microservices? (Repeating values structure in every chart is pattern duplication)

Constraint Teaching

Teach AI about your operational model:

My team uses GitOps (Flux/ArgoCD). Different environments are separate
Git branches—dev branch points to values-dev.yaml, main branch points to
values-prod.yaml. I want the chart to be immutable: once deployed, operators
can't accidentally change resources.

Also, we have resource quotas per namespace. Dev allows 1 CPU limit per
pod, production allows 10. How should I set up values so that resource
limits stay within namespace quotas?

One more constraint: we maintain 8 microservices with similar patterns.
What structure would make it easy to copy resources between charts without
duplicating documentation?

Refinement: AI Adapts to Your Workflow

AI now considers:

  • GitOps model (files as source of truth, immutability after deployment)
  • Quota constraints (limits must respect namespace settings)
  • Reusability at scale (8 microservices means patterns must transfer)

Updated response suggests:

  • Using Helm --values layering with clear precedence
  • Schema validation to catch quota violations
  • A documented pattern for copying resource blocks between charts
  • Possibly a shared library chart with common resource templates

Final Validation

Test the proposed structure:

  • Could your team apply the same pattern to the second microservice without re-reading documentation?
  • If someone pushes a values-prod.yaml that violates quotas, would the chart or your cluster catch it?
  • Does the GitOps workflow (Git commit → Flux → Deployed resources) remain clean?

Scenario 9.3: Custom Hook Design with AI Iteration

Challenge: Pre-Deployment Database Migration

Your chart deploys an AI service that needs database migrations before the main deployment. The migration must:

  • Run once per deployment (not per pod)
  • Wait for database to be ready
  • Fail the deployment if migration fails
  • Clean up the job after success (but keep logs)
  • Not run if the deployment is just a --dry-run

Initial Request

I need a Helm hook that runs a database migration before my main deployment.
The migration must run exactly once, wait for the database, and fail the
deployment if it fails. How should I structure this as a Helm hook? Should
I use a Job or Pod?

Evaluation: Hook Mechanics vs Operational Reality

AI will explain:

  • Hook timing (pre-install, pre-upgrade, post-install, etc.)
  • Job vs Pod for hooks
  • How hooks interact with rollbacks

Ask yourself:

  • What happens if migration succeeds, then the main pod crashes? (Do you re-run migration on next upgrade?)
  • Logs: After successful migration, do you need the job for debugging?
  • Dry-runs: If someone runs helm upgrade --dry-run, does the migration Job actually run? (It might!)
  • Idempotency: Can this migration run multiple times safely, or must you prevent re-execution?
  • Scaling: If you deploy 5 replicas, does the migration run 5 times in parallel?

Constraint Teaching

Teach AI about your specific operational constraints:

In our production environment, database migrations must be strictly idempotent
and safe to re-run. We use a schema versioning table. Our DevOps team wants
to preserve migration job logs in the cluster for 30 days (for audit purposes).

For --dry-run: we don't want the migration to actually execute, just validate
the chart. How do you prevent a Helm hook from running during dry-runs?

Also, our database takes 30-60 seconds to become available after the pod starts.
I need retry logic. Should that be in the hook script itself, or as part of the
Helm hook configuration?

Refinement: AI Adapts to Your Constraints

AI now understands:

  • Idempotency requirement (schema versioning, safe for re-runs)
  • Audit needs (preserve logs, not auto-delete)
  • Dry-run handling (skip actual migration, validate only)
  • Startup timing (database boot delay → retry logic needed)

Updated response:

  • Uses a Job (not Pod) for durability
  • Includes helm.sh/hook-delete-policy: "hook-succeeded,hook-failed" ... wait, no. You need logs, so delete policy changes.
  • Shows script with retry logic (polling database readiness)
  • Explains how to structure so Helm skips hook execution during --dry-run (or confirms it's Helm's default)

Final Validation

Examine the hook Job closely:

apiVersion: batch/v1
kind: Job
metadata:
annotations:
"helm.sh/hook": pre-upgrade,pre-install
"helm.sh/hook-weight": "-5" # Run before main resources
spec:
template:
spec:
containers:
- name: migrator
command:
- /bin/sh
- -c
- |
for i in {1..30}; do
pg_isready -h db.default && break
sleep 2
done
./migrate.sh
restartPolicy: Never

Ask:

  • Does the hook weight ensure migrations run before your deployment?
  • Is the retry logic clear and maintainable?
  • Can your team predict what happens on the next helm upgrade?
  • Are logs preserved as required?

Scenario 9.4: Chart Optimization Through AI Feedback Cycles

Challenge: Performance and Size

Your chart works correctly but feels bloated. The templates generate large manifests, your values.yaml has confusing nesting, and you suspect you're over-templating in some places.

Initial Request

I have a working Helm chart for my AI service, but it feels over-complex.
The rendered manifests are large, the values.yaml has 4 levels of nesting
that I don't fully understand, and some templates have conditional logic
that might be unnecessary.

Can you review this chart structure and suggest optimizations? I want to
maintain all functionality but reduce complexity. What should I simplify?

[Paste your Chart.yaml, values.yaml, and template names]

Evaluation: Optimization vs Pragmatism

AI will suggest:

  • Removing unnecessary conditionals
  • Flattening values hierarchy
  • Combining related templates

Ask yourself:

  • Are the suggested simplifications worth losing flexibility? (Flattening values might save 20 lines but break per-environment customization)
  • Would future maintainers understand the simplified version? (Clever is not the same as clear)
  • What's actually unused? (AI might mark features "optional" when your team relies on them)
  • Is size really a problem? (Sometimes explicit is better than compact)

Constraint Teaching

Teach AI about your maintenance reality:

My team is 4 people. We maintain 8 charts with similar patterns. When I
simplify this chart, it will become the template for the other 7. So I need
to optimize for:
1. Clarity (new team members should understand it in < 1 hour)
2. Consistency (patterns should match across all 8 charts)
3. Flexibility (we need per-environment customization)

What's the MINIMUM viable structure that keeps these three properties?
Which parts can I safely remove without losing necessary functionality?

Refinement and Validation

AI now prioritizes:

  • Clarity for new maintainers
  • Consistency across 8 charts
  • Flexibility within that simplicity

Ask yourself:

  • Does the simplified chart still feel like "yours"?
  • Can a new team member understand it without asking questions?
  • Will other charts copy this pattern correctly?

Scenario 9.5: Debugging Template Issues with AI Help

Challenge: Template Rendering Error

You get an error: Error: template: mychart/templates/deployment.yaml:32: undefined variable "$imageName"

The template seems correct, but something is wrong with variable scope.

Initial Request

I'm getting an "undefined variable" error in my Helm template. Here's my
template code:

[Paste the problematic section]

The error says line 32: undefined variable "$imageName". I set this variable
at line 10. Why can't the template see it?

Evaluation: Scope Rules in Go Templates

AI will explain:

  • Variable scope (variables are only valid within their block)
  • with blocks creating new scopes
  • How include changes scope

Ask yourself:

  • Does the explanation make sense? (Is the variable actually out of scope, or is there a typo?)
  • Would the suggested fix maintain your template's logic? (Sometimes the fix is moving the variable, sometimes restructuring the template)
  • Is this a one-off fix or a pattern problem? (If multiple templates have scope issues, the problem might be in your approach)

Constraint Teaching

Ask AI for clarity:

I understand the scope issue now. But I have 3 other templates with similar
patterns. Instead of fixing them one-by-one, what's the general pattern I
should use? Should I avoid `with` blocks for variable definitions? Or is
there a better structure I'm missing?

Refinement: AI Provides a Pattern

Instead of just fixing this bug, AI explains the underlying pattern:

  • When to use with (accessing values, not storing intermediate results)
  • When to use variables outside with blocks (storing values that cross scopes)
  • Named templates (include vs template) for more complex scoping

Final Validation

  • Can you apply this pattern to your other 3 templates?
  • Does your debugging approach now feel systematic rather than trial-and-error?

What Improved Through Iteration

After these five scenarios, examine what changed:

From Initial Request to Final Output:

  • Initial requests were often imprecise (no mention of your ingress controller, your GitOps model, your team size)
  • AI's first suggestions were generic (useful starting points, but not tailored to your constraints)
  • Through constraint teaching, the dialogue converged on:
    • Your infrastructure (Nginx-ingress, OIDC, database timing)
    • Your operational model (GitOps, immutability, audit requirements)
    • Your scale and team dynamics (8 microservices, 4-person team)
    • Your values (clarity over cleverness, consistency across charts)
  • Final outputs were specific, maintainable, and aligned with your actual practices

Key Takeaway: Neither your initial vague request nor AI's first response was sufficient. The iteration between your domain knowledge and AI's pattern library produced something neither of you would have designed alone.


Try With AI: Building Your Chart with Iterative Refinement

Choose one of these chart scenarios or bring your own requirement:

Setup

Use Claude, GPT-4, or another capable LLM with Helm knowledge.

Part 1: Initial Request

Describe a chart you want to build (or improve):

I need a Helm chart for [service] with these requirements:
- [requirement 1]
- [requirement 2]
- [operational constraint 1]
- [infrastructure constraint 1]

What's your suggested approach?

Examples:

  • "Helm chart for a FastAPI service with Postgres, Redis, and TLS"
  • "Improve my existing chart to reduce values nesting"
  • "Design a hook strategy for my service's initialization"

Part 2: Critical Evaluation

Review AI's response without accepting it as-is:

  • Which suggestions match YOUR environment? (Not everyone uses Nginx-ingress)
  • Which assumptions did AI make? (Look for "typical," "usually," "best practice")
  • What's missing? (Security policies? Your team size? Your GitOps tooling?)
  • What would actually work in YOUR cluster? (Not a generic Kubernetes cluster, YOUR cluster)

Write down 2-3 constraints AI missed.

Part 3: Constraint Teaching

Tell AI about your reality:

My actual constraints are:
- I use [your ingress controller]
- My team practices [your operational model]
- We need to support [your scale]
- We cannot use [your limitations]

Given these constraints, how does your approach change?

Observe: AI's response now drops the generic advice and focuses on your specific context.

Part 4: Refinement Dialogue

Continue the conversation based on AI's revised response:

  • Ask for clarification on anything unclear
  • Request the actual YAML/values structure
  • Challenge assumptions (e.g., "Why delete the job on success?")
  • Propose alternatives ("What if we use library charts instead?")

Continue until the result feels practical for YOUR situation.

Part 5: Self-Reflection

When the dialogue converges:

  • What improved through iteration? Describe the journey from initial request to final solution
  • What did you teach AI about your domain? List the constraints that weren't obvious from the first request
  • What did you learn from AI's suggestions? What patterns or approaches hadn't you considered?
  • Is the final result better than either of you would have designed alone? Why or why not?

Safety note: Always review AI-generated security configurations (RBAC, NetworkPolicy, authentication) against your organization's policies before deploying. AI sometimes suggests overly permissive settings.


You've now experienced the core collaboration pattern used throughout AI-native development: iterative refinement where your domain judgment and AI's pattern library converge toward production-grade solutions.

In Lesson 10 (Capstone), you'll apply this collaboration pattern at scale—designing a production AI agent chart using specification-first methodology. In Lesson 11, you'll encode your Helm expertise as a reusable skill that helps you and your team make these collaboration decisions consistently.