Your agent is now running in a Deployment (Chapter 5). Kubernetes monitors it and restarts it if the container crashes. But there's a critical problem: a container that's running might not be healthy.
Imagine this: Your agent's model loads asynchronously. For the first 30 seconds after startup, the container is running but can't handle requests. Kubernetes sees it as "ready" and sends traffic to it immediately. Users get errors. The container didn't crash, so Kubernetes doesn't restart it. Your service is degraded but technically "up."
This chapter teaches health checks—the mechanism that lets Kubernetes know whether your container is actually ready to serve traffic, whether it's still alive, and how long to wait during startup before expecting it to respond.
Consider your AI agent startup sequence:
Kubernetes provides three health check mechanisms:
Key insight: Readiness probes are about external traffic. Even a healthy, running Pod might not be ready to serve external requests.
Key insight: Liveness probes are about Pod lifecycle. A Pod can be running but logically dead (stuck waiting, infinite loop).
Key insight: Startup probes buy time for initialization. Once startup succeeds, other probes begin their work.
HTTP probes call a health endpoint and check the response code.
Explanation:
Your FastAPI agent needs a readiness endpoint that returns 200 only when the model is loaded:
Terminal Testing:
Execution Results:
Use TCP probes when you don't have an HTTP endpoint, or for protocols like gRPC/database connections.
How it works:
Run arbitrary shell commands for custom health checks. Exit code 0 = healthy.
Watch out: If timeoutSeconds >= periodSeconds, you get overlapping probes and unpredictable behavior.
Consider this pattern for agents with variable startup times:
Setup: You have a containerized agent with a health endpoint.
Challenge: Configure probes for an agent with these characteristics:
Action Prompt: "Write a Deployment manifest with startup, readiness, and liveness probes for an agent that takes 20s to load. Ensure readiness is aggressive (5s period) and liveness is conservative (30s period)."
You built a kubernetes-deployment skill in Chapter 0. Test and improve it based on what you learned.
Ask yourself:
If you found gaps: