Health Checks: Liveness, Readiness, Startup Probes

Name: Digital FTEs: Engineering — Achieving 10× Productivity
Author: Muhammad Usman Akbar

Your agent is now running in a Deployment (Chapter 5). Kubernetes monitors it and restarts it if the container crashes. But there's a critical problem: a container that's running might not be healthy.

Imagine this: Your agent's model loads asynchronously. For the first 30 seconds after startup, the container is running but can't handle requests. Kubernetes sees it as "ready" and sends traffic to it immediately. Users get errors. The container didn't crash, so Kubernetes doesn't restart it. Your service is degraded but technically "up."

This chapter teaches health checks—the mechanism that lets Kubernetes know whether your container is actually ready to serve traffic, whether it's still alive, and how long to wait during startup before expecting it to respond.

The Problem: Running ≠ Ready

Consider your AI agent startup sequence:

Timestamp	Event
0s	Container starts, main process begins.
0.5s	Python initializes, imports libraries.
5s	Model weights load into memory.
10s	Embedding vectors cache builds.
30s	Application ready to serve requests.

Scenario	Result
Without Health Checks	Kubernetes sends traffic at 1s. Requests fail, users see errors. Since the container hasn't crashed, Kubernetes does nothing.
With Health Checks	Readiness probe fails for the first 30 seconds. Kubernetes removes the Pod from service endpoints. Traffic only flows once the model is loaded.

Three Types of Probes

Kubernetes provides three health check mechanisms:

1. Readiness Probe (Is this Pod ready to serve traffic?)

Component	Description
Purpose	Determines if a Pod should receive traffic from Services/Load Balancers.
When to use	Slow application startup (models/caches), dependencies not ready, or temporary removal during updates.
Behavior	Fail: Traffic stops. Success: Traffic resumes.

Key insight: Readiness probes are about external traffic. Even a healthy, running Pod might not be ready to serve external requests.

2. Liveness Probe (Is this Pod still alive?)

Component	Description
Purpose	Detects if a container is in a broken/stuck state and needs a restart.
When to use	Detect deadlocks, infinite loops, or responsive but logically "dead" applications (e.g., memory leaks).
Behavior	Fail continuously: Kubernetes restarts the Pod. Success: No action taken.

Key insight: Liveness probes are about Pod lifecycle. A Pod can be running but logically dead (stuck waiting, infinite loop).

3. Startup Probe (Has this Pod finished initializing?)

Component	Description
Purpose	Prevents liveness/readiness probes from triggering during slow startup cycles.
When to use	Long initialization times (30s+), model loading, or cache warming.
Behavior	Fail during startup: Pod gets more time. Success: Liveness/Readiness probes take over.

Key insight: Startup probes buy time for initialization. Once startup succeeds, other probes begin their work.

HTTP GET Probes (Most Common)

HTTP probes call a health endpoint and check the response code.

Basic HTTP Probe Structure

yaml

spec:
  containers:
  - name: agent
    image: my-agent:v1
    ports:
    - containerPort: 8000
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 8000
      initialDelaySeconds: 10
      periodSeconds: 5
      timeoutSeconds: 1
      failureThreshold: 3

Explanation:

httpGet.path: Endpoint to call.
httpGet.port: Container port (number or name).
initialDelaySeconds: Wait 10s before first probe (let container start).
periodSeconds: Check every 5 seconds.
timeoutSeconds: If response takes >1s, count as failure.
failureThreshold: After 3 failures, take action (remove from service or restart).

HTTP Readiness Probe Example

Your FastAPI agent needs a readiness endpoint that returns 200 only when the model is loaded:

python

# main.py - FastAPI agent with readiness endpoint
from fastapi import FastAPI
from fastapi.responses import JSONResponse
import asyncio
import time

app = FastAPI()

# Simulate model loading time
model_loaded = False
load_start = None

@app.on_event("startup")
async def load_model():
    global model_loaded, load_start
    load_start = time.time()
    print("Starting model load...")
    # Simulate 15s model loading
    await asyncio.sleep(15)
    model_loaded = True
    load_time = time.time() - load_start
    print(f"Model loaded in {load_time:.1f}s")

@app.get("/health/ready")
async def readiness():
    """Returns 200 only when model is fully loaded"""
    if not model_loaded:
        return JSONResponse(
            {"status": "loading", "ready": False},
            status_code=503
        )
    return JSONResponse({"status": "ready", "ready": True})

@app.get("/health/live")
async def liveness():
    """Always returns 200 if main process is running"""
    return JSONResponse({"status": "alive"})

@app.post("/predict")
async def predict(data: dict):
    """Agent inference endpoint"""
    if not model_loaded:
        return JSONResponse(
            {"error": "Model still loading, try again soon"},
            status_code=503
        )
    # Do actual inference
    return {"prediction": "example output"}

Terminal Testing:

bash

# First request during loading
curl http://localhost:8000/health/ready
# Output: {"status": "loading", "ready": false}

# After 15 seconds, try again
curl http://localhost:8000/health/ready
# Output: {"status": "ready", "ready": true}

# Liveness endpoint always responds
curl http://localhost:8000/health/live
# Output: {"status": "alive"}

Deployment with HTTP Readiness and Liveness Probes

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: agent
  template:
    metadata:
      labels:
        app: agent
    spec:
      containers:
      - name: agent
        image: my-agent:v1
        ports:
        - containerPort: 8000
          name: http
        # Wait for model to load before sending traffic
        readinessProbe:
          httpGet:
            path: /health/ready
            port: http
          initialDelaySeconds: 5    # Check after 5s
          periodSeconds: 5          # Check every 5s
          timeoutSeconds: 2         # Must complete in 2s
          failureThreshold: 2       # 2 failures = not ready
        # Detect if container is stuck or crashed
        livenessProbe:
          httpGet:
            path: /health/live
            port: http
          initialDelaySeconds: 20   # Wait for startup
          periodSeconds: 10         # Check every 10s
          timeoutSeconds: 2
          failureThreshold: 3       # 3 failures = restart
        # Buy time for initialization
        startupProbe:
          httpGet:
            path: /health/ready
            port: http
          periodSeconds: 5
          failureThreshold: 6       # Allow 30s for startup

Execution Results:

bash

kubectl apply -f deployment.yaml

# Watch pod startup
kubectl get pods -w
# agent-deployment-7d4f5c6b9f-abc123   0/1     Running   0
# At 15s: startup probe succeeds (model loaded)
# At 20s:
# agent-deployment-7d4f5c6b9f-abc123   1/1     Running   0

# View probe events
kubectl describe pod agent-deployment-7d4f5c6b9f-abc123
# Events:
#   Normal   Created    45s   kubelet            Created container agent
#   Normal   Started    45s   kubelet            Started container agent
#   Warning  Unhealthy  40s   kubelet            Readiness probe failed: HTTP probe failed
#   Normal   Ready      30s   kubelet            Container is ready

TCP Socket Probes (Port Availability)

Use TCP probes when you don't have an HTTP endpoint, or for protocols like gRPC/database connections.

TCP Probe Configuration

yaml

spec:
  containers:
  - name: cache
    image: redis:7
    ports:
    - containerPort: 6379
    livenessProbe:
      tcpSocket:
        port: 6379
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 1
      failureThreshold: 3

How it works:

Kubernetes attempts a TCP connection to the specified port.
Success: Port is open and accepting connections.
Timeout/failure: Port is not responding.

Exec Probes (Custom Commands)

Run arbitrary shell commands for custom health checks. Exit code 0 = healthy.

Exec Probe Example

yaml

spec:
  containers:
  - name: agent
    image: my-agent:v1
    readinessProbe:
      exec:
        command:
        - /bin/sh
        - -c
        - |
          # Check if critical files exist and are recent
          [ -f /app/model.pkl ] && \
          [ $(find /app/model.pkl -mmin -2) ] && \
          curl -sf http://localhost:8000/health/ready > /dev/null
      initialDelaySeconds: 5
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 2

Timing Parameters (Critical Decisions)

Parameter	Purpose	Best Practice for AI Agents
initialDelaySeconds	Wait before the first check.	Set to 1.5x actual model load time if NOT using startup probes.
periodSeconds	Frequency of health checks.	Readiness: 5-10s (quick removal). Liveness: 10-30s (less aggressive).
timeoutSeconds	Wait time for a response.	Usually 1-2s. Increase if model inference is slow but part of readiness.
failureThreshold	Failures before taking action.	Startup: 10+ (very tolerant). Liveness: 3-5 (moderate).

Watch out: If timeoutSeconds >= periodSeconds, you get overlapping probes and unpredictable behavior.

Debugging Probe Failures

Step	Command	What to Look For
1. Status	kubectl get pods	READY 0/1 means readiness failure. High RESTARTS means liveness failure.
2. Describe	kubectl describe pod <name>	Check the Events section for exact probe failure strings.
3. Logs	kubectl logs <name>	Look for app-side errors during model loading or initialization.
4. Manual Test	kubectl port-forward <name> 8000:8000	Manually curl the health endpoints to verify their logic.

AI-Native Health Checks

Consider this pattern for agents with variable startup times:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-agent
spec:
  template:
    spec:
      containers:
      - name: agent
        image: my-agent:latest
        startupProbe:
          httpGet: { path: /health/startup, port: 8000 }
          periodSeconds: 5
          failureThreshold: 30  # Allow 150s for startup
        readinessProbe:
          httpGet: { path: /health/ready, port: 8000 }
          periodSeconds: 5
          failureThreshold: 2
        livenessProbe:
          httpGet: { path: /health/live, port: 8000 }
          periodSeconds: 10
          failureThreshold: 3

Try With AI

Setup: You have a containerized agent with a health endpoint.

Challenge: Configure probes for an agent with these characteristics:

Model loads in 20 seconds.
Health endpoint responds in under 500ms when healthy.
You want aggressive failure detection for readiness (pod removed quickly if unhealthy).
You want conservative failure detection for liveness (don't restart on transient errors).

Action Prompt: "Write a Deployment manifest with startup, readiness, and liveness probes for an agent that takes 20s to load. Ensure readiness is aggressive (5s period) and liveness is conservative (30s period)."

Reflect on Your Skill

You built a kubernetes-deployment skill in Chapter 0. Test and improve it based on what you learned.

Test Your Skill

bash

Using my kubernetes-deployment skill, configure liveness, readiness, and startup probes. Does my skill generate proper probe configurations with appropriate timing parameters?

Identify Gaps

Ask yourself:

Did my skill include the three probe types (liveness, readiness, startup) and when to use each?
Did it explain HTTP GET, TCP socket, and exec probe mechanisms?
Did it cover debugging patterns for probe failures?

Improve Your Skill

If you found gaps:

bash

My kubernetes-deployment skill is missing health probe configuration and debugging patterns. Update it to include all three probe types, appropriate timing for AI agents, and probe failure diagnosis steps.