Capstone: Production AI Agent Chart
You've learned to template (Chapter 1), compose dependencies (Chapter 4), orchestrate with hooks (Chapter 5), test charts (Chapter 6), and distribute via OCI (Chapter 7). You understand library charts (Chapter 10) and have collaborated with AI on chart development (Chapter 11).
Now you'll synthesize everything into a single production-grade project: deploying a complete AI agent that talks to PostgreSQL for state and Redis for caching, with database migrations automated through hooks, and configuration tailored for dev/staging/production environments.
This is a specification-first capstone. You begin by writing a clear specification of what you're building before implementing anything. The specification becomes your contract with your implementation—and with AI if you choose to use it for validation or refinement.
Part 1: Specification
Before writing any YAML, establish what you're building.
Intent: What Are We Building?
You're creating a production-ready Helm chart that deploys an AI agent service with complete infrastructure:
- AI Agent Container: The application itself (could be your Sub-module 1 Chapter 9 AI service)
- PostgreSQL Database: Persistent state storage
- Redis Cache: In-memory caching for fast inference lookups
- Database Schema Manager: Automatic schema initialization and migrations
- Multi-Environment Configuration: Different resource levels for dev/staging/prod
Success Criteria (Acceptance Tests)
Your chart succeeds when ALL of these are true:
Criterion 1: Single helm install Deploys Complete Stack
- Agent Pod running
- PostgreSQL StatefulSet running
- Redis running
- Pre-upgrade migration Job completed successfully
Criterion 2: helm test Verifies Connectivity
- Agent can connect to PostgreSQL
- Agent can connect to Redis
- Both dependencies report "healthy"
Criterion 3: Multi-Environment Deployment Works
Each deployment uses appropriate resource levels (dev: minimal, staging: moderate, prod: full).
Requirements
Your chart MUST include:
Configuration:
- Chart.yaml with dependencies on PostgreSQL and Redis (Bitnami charts)
- values.yaml with production defaults
- values-dev.yaml, values-staging.yaml, values-prod.yaml for environment-specific overrides
- values.schema.json validation (at least for critical fields)
Templates:
- templates/deployment.yaml for Agent
- templates/service.yaml (ClusterIP for Agent)
- templates/_helpers.tpl with standard label macros
- templates/configmap.yaml for non-secret configuration
- templates/secret.yaml for sensitive data (database credentials)
Lifecycle Management:
- templates/hooks/pre-upgrade-migration.yaml Job to run database migrations
- Hook annotations with proper weights and delete policies
Testing:
- templates/tests/test-connection.yaml to verify Agent ↔ DB ↔ Cache connectivity
Documentation:
- Chart-level README.md with configuration options and usage examples
Constraints
- All database migrations run BEFORE the deployment updates
- Secrets must NOT appear in ConfigMaps or unencrypted files
- Resource requests must scale appropriately per environment (dev: 256Mi/100m, staging: 512Mi/250m, prod: 1Gi/500m)
- Deployment must survive helm upgrade with zero downtime (PreparingUpdate strategy)
- PostgreSQL and Redis must be included as dependencies, NOT deployed externally
Non-Goals
We are NOT building:
This is a single-cluster, HTTP-based deployment focused on correct Helm patterns.
Component Diagram
Rendering diagram...
How Components Relate
- Agent Deployment starts the AI service container
- Pre-Upgrade Migration Hook runs a database schema initialization Job BEFORE the deployment updates
- ConfigMap provides environment-specific configuration to the Agent
- Secret provides database credentials to the Agent (mounted as volume)
- PostgreSQL Dependency is installed automatically (handles StatefulSet, PVC, Service)
- Redis Dependency is installed automatically (handles StatefulSet, Service)
- Test Pod verifies Agent can reach both PostgreSQL and Redis after deployment
Part 3: Implementation
Now build the chart step by step.
Step 1: Chart.yaml with Dependencies
Define what your chart includes and depends on:
Output: This metadata tells Helm:
- The chart is called task-api
- It depends on PostgreSQL 12.x and Redis 17.x from Bitnami
- Dependencies are only installed if postgresql.enabled: true and redis.enabled: true
Update dependencies:
Output: Helm downloads the PostgreSQL and Redis charts to charts/ directory.
Step 2: Base values.yaml
Create production-appropriate defaults:
Output: These values provide:
- Agent image and resource defaults (production-grade 512Mi/1Gi)
- PostgreSQL enabled with persistence
- Redis enabled with replication
- Migration image reference
- Standard organization labels
Step 3: Environment-Specific Overrides
Create values-dev.yaml:
Output: Dev environment uses:
- 100m CPU / 256Mi memory (1/5 of production)
- Debug logging
- 1 worker thread instead of 4
- No Redis replicas
- Smaller persistent volumes
Create values-staging.yaml:
Output: Staging environment uses:
- 250m CPU / 512Mi memory (moderate)
- 2 replicas for HA testing
- Production-like configuration
- Medium storage volumes
Create values-prod.yaml:
Output: Production environment uses:
- 500m CPU / 1Gi memory (full tier)
- 3 Agent replicas
- PostgreSQL replicas enabled
- Redis replicas enabled
- Large persistent volumes
- Reduced logging (warn level only)
Step 4: Chart.yaml and Schema
Create values.schema.json for validation:
Output: This schema enforces:
- replicaCount between 1-10
- Resource requests for CPU and memory are required
- PostgreSQL must have persistence enabled
- Invalid configurations are caught during validation
Step 5: Deployment Template
Create templates/deployment.yaml:
Output: This creates:
- Deployment with configurable replicas
- Pod with agent container
- Database and Redis connection URLs from secrets
- Liveness/readiness probes
- Non-root security context
- Checksum annotations (triggers rollouts when config changes)
Step 6: Service Template
Create templates/service.yaml:
Output: Creates a ClusterIP service exposing port 8000 to other pods in the cluster.
Step 7: ConfigMap Template
Create templates/configmap.yaml:
Output: Non-sensitive configuration stored in ConfigMap (separate from secrets).
Step 8: Secret Template
Create templates/secret.yaml:
Output: Base64-encoded secrets for:
- PostgreSQL connection URL (constructed from dependency hostname)
- Redis connection URL (constructed from dependency hostname)
Step 9: Helpers Template
Create templates/_helpers.tpl:
Output: Helper functions for consistent naming and labels throughout all templates.
Step 10: Pre-Upgrade Migration Hook
Create templates/hooks/pre-upgrade-migration.yaml:
Output: When you run helm upgrade, this Job runs FIRST:
- Executes database migrations before the deployment updates
- Uses weight -5 to run first in sequence
- Deletes itself after success
- Supports retries (backoffLimit: 3)
- Timeout after 5 minutes
Step 11: Connection Test
Create templates/tests/test-connection.yaml:
Output: When you run helm test task-api, this Pod verifies both database and cache are accessible.
Step 12: Chart README
Create README.md:
Usage
Update Release
The pre-upgrade migration hook runs automatically before the upgrade proceeds.
Test Connectivity
This verifies Agent ↔ PostgreSQL and Agent ↔ Redis connectivity.
Rollback Release
Delete Release
Troubleshooting
Migration Failed
Check migration job logs:
Agent Pod Not Running
Check deployment:
Database Connection Errors
Verify secret:
Verify PostgreSQL is running:
Output: Documentation covering installation, configuration options, usage patterns, and troubleshooting.---
Part 4: Validation
Verify your chart meets the specification before deployment.
Check 1: Helm Lint
Validate chart syntax and best practices:```bash
$ helm lint ./task-api-chart
==> Linting ./task-api-chart
[INFO] Chart.yaml: icon is recommended
1 chart(s) linted, 0 error(s)
Output: All YAML renders correctly with production values substituted.
Check 3: Schema Validation
Validate values against schema:
Output: Invalid configurations are caught (if values-invalid.yaml had replicaCount: 50, this would fail).
Check 4: Acceptance Criteria Verification
Criterion 1: Single helm install Deploys Complete Stack
Output: ✓ Single helm install deployed all components:
- Agent Deployment running
- PostgreSQL StatefulSet running
- Redis StatefulSet running
- Pre-upgrade migration Job completed
Criterion 2: helm test Verifies Connectivity
Output: ✓ Connectivity test passed:
- Agent can reach PostgreSQL
- Agent can reach Redis
- Both dependencies report healthy
Check test pod logs:
Output: Both database and cache connectivity verified.
Criterion 3: Multi-Environment Deployment Works
Deploy dev:
Output: Dev deployment uses 100m CPU / 256Mi memory.
Deploy staging:
Output: Staging deployment uses 250m CPU / 512Mi memory with 2 replicas.
Deploy production:
Output: Production deployment uses 500m CPU / 1Gi memory with 3 replicas.
Verify each environment has appropriate settings:
Output: ✓ Multi-environment configuration verified
Summary: All Acceptance Criteria Met
- ✓ Criterion 1: Single helm install deployed complete stack with all components
- ✓ Criterion 2: helm test verified connectivity to both PostgreSQL and Redis
- ✓ Criterion 3: Multi-environment deployment works with appropriate resource levels
Your chart meets the specification.
Try With AI
Now that you've built a production chart from specification, you can refine it further with AI collaboration. Your specification and implementation give you the foundation to evaluate AI suggestions critically.
Setup
You'll use Claude or your preferred AI assistant to review and enhance your chart. Keep your specification and current implementation accessible.
Prompts
Part 1: Specification Review
Ask AI to validate your specification against production Helm best practices:
Part 2: Implementation Review
Ask AI to evaluate your chart against the spec:
Part 3: Edge Case Testing
Ask AI to identify test scenarios you might have missed:
Part 4: Production Hardening
Ask AI for suggestions to make your chart more production-ready:
Expected Insights
Through this collaboration, AI will likely suggest:
- Missing resource management: Pod Disruption Budgets (PDB) to survive cluster maintenance
- Advanced deployment strategies: Blue-green deployments or canary releases
- Observability patterns: Prometheus metrics and Grafana dashboards
- Security enhancements: Network policies and RBAC roles
- Operational runbooks: Procedures for common incidents (migration failures, pod evictions)
Evaluate and Iterate
For each suggestion:
- Ask yourself: Does this align with my specification? Is it in-scope?
- Implement selectively: Add suggestions that improve the chart's ability to meet acceptance criteria
- Document decisions: Record which suggestions you adopted and which you deferred (and why)
What emerges is a chart that's not just specification-compliant, but hardened for real production use.