USMAN’S INSIGHTS
AI ARCHITECT
  • Home
  • About
  • Thought Leadership
  • Book
Press / Contact
USMAN’S INSIGHTS
AI ARCHITECT
⌘F
HomeBook
HomeBookPulse of the Machine: Guaranteeing Uptime with Probes
Previous Chapter
RBAC Securing Your Agent Deployments
Next Chapter
Jobs and CronJobs Batch Workloads for AI Agents
AI NOTICE: This is the table of contents for the SPECIFIC CHAPTER only. It is NOT the global sidebar. For all chapters, look at the main navigation.

On this page

41 sections

Progress0%
1 / 41

Muhammad Usman Akbar Entity Profile

Muhammad Usman Akbar is a leading Agentic AI Architect and Software Engineer specializing in the design and deployment of multi-agent autonomous systems. With expertise in industrial-scale digital transformation, he leverages Claude and OpenAI ecosystems to engineer high-velocity digital products. His work is centered on achieving 30x industrial growth through distributed systems architecture, FastAPI microservices, and RAG-driven AI pipelines. Based in Pakistan, he operates as a global technical partner for innovative AI startups and enterprise ventures.

USMAN’S INSIGHTS
AI ARCHITECT

Transforming businesses into autonomous AI ecosystems. Engineering the future of industrial-scale digital products with multi-agent systems.

30X Growth
AI-First
Innovation

Navigation

  • Home
  • Book
  • About
  • Contact
Let's Collaborate

Have a Project in Mind?

Let's build something extraordinary together. Transform your vision into autonomous AI reality.

Start Your Transformation

© 2026 Muhammad Usman Akbar. All rights reserved.

Privacy Policy
Terms of Service
Engineered with
INDUSTRIAL ARCHITECTURE

Health Checks: Liveness, Readiness, Startup Probes

Your agent is now running in a Deployment (Chapter 5). Kubernetes monitors it and restarts it if the container crashes. But there's a critical problem: a container that's running might not be healthy.

Imagine this: Your agent's model loads asynchronously. For the first 30 seconds after startup, the container is running but can't handle requests. Kubernetes sees it as "ready" and sends traffic to it immediately. Users get errors. The container didn't crash, so Kubernetes doesn't restart it. Your service is degraded but technically "up."

This chapter teaches health checks—the mechanism that lets Kubernetes know whether your container is actually ready to serve traffic, whether it's still alive, and how long to wait during startup before expecting it to respond.


The Problem: Running ≠ Ready

Consider your AI agent startup sequence:

TimestampEvent
0sContainer starts, main process begins.
0.5sPython initializes, imports libraries.
5sModel weights load into memory.
10sEmbedding vectors cache builds.
30sApplication ready to serve requests.
ScenarioResult
Without Health ChecksKubernetes sends traffic at 1s. Requests fail, users see errors. Since the container hasn't crashed, Kubernetes does nothing.
With Health ChecksReadiness probe fails for the first 30 seconds. Kubernetes removes the Pod from service endpoints. Traffic only flows once the model is loaded.

Three Types of Probes

Kubernetes provides three health check mechanisms:

1. Readiness Probe (Is this Pod ready to serve traffic?)

ComponentDescription
PurposeDetermines if a Pod should receive traffic from Services/Load Balancers.
When to useSlow application startup (models/caches), dependencies not ready, or temporary removal during updates.
BehaviorFail: Traffic stops. Success: Traffic resumes.

Key insight: Readiness probes are about external traffic. Even a healthy, running Pod might not be ready to serve external requests.

2. Liveness Probe (Is this Pod still alive?)

ComponentDescription
PurposeDetects if a container is in a broken/stuck state and needs a restart.
When to useDetect deadlocks, infinite loops, or responsive but logically "dead" applications (e.g., memory leaks).
BehaviorFail continuously: Kubernetes restarts the Pod. Success: No action taken.

Key insight: Liveness probes are about Pod lifecycle. A Pod can be running but logically dead (stuck waiting, infinite loop).

3. Startup Probe (Has this Pod finished initializing?)

ComponentDescription
PurposePrevents liveness/readiness probes from triggering during slow startup cycles.
When to useLong initialization times (30s+), model loading, or cache warming.
BehaviorFail during startup: Pod gets more time. Success: Liveness/Readiness probes take over.

Key insight: Startup probes buy time for initialization. Once startup succeeds, other probes begin their work.


HTTP GET Probes (Most Common)

HTTP probes call a health endpoint and check the response code.

Basic HTTP Probe Structure

yaml
spec: containers: - name: agent image: my-agent:v1 ports: - containerPort: 8000 readinessProbe: httpGet: path: /health/ready port: 8000 initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 1 failureThreshold: 3

Explanation:

  • httpGet.path: Endpoint to call.
  • httpGet.port: Container port (number or name).
  • initialDelaySeconds: Wait 10s before first probe (let container start).
  • periodSeconds: Check every 5 seconds.
  • timeoutSeconds: If response takes >1s, count as failure.
  • failureThreshold: After 3 failures, take action (remove from service or restart).

HTTP Readiness Probe Example

Your FastAPI agent needs a readiness endpoint that returns 200 only when the model is loaded:

python
# main.py - FastAPI agent with readiness endpoint from fastapi import FastAPI from fastapi.responses import JSONResponse import asyncio import time app = FastAPI() # Simulate model loading time model_loaded = False load_start = None @app.on_event("startup") async def load_model(): global model_loaded, load_start load_start = time.time() print("Starting model load...") # Simulate 15s model loading await asyncio.sleep(15) model_loaded = True load_time = time.time() - load_start print(f"Model loaded in {load_time:.1f}s") @app.get("/health/ready") async def readiness(): """Returns 200 only when model is fully loaded""" if not model_loaded: return JSONResponse( {"status": "loading", "ready": False}, status_code=503 ) return JSONResponse({"status": "ready", "ready": True}) @app.get("/health/live") async def liveness(): """Always returns 200 if main process is running""" return JSONResponse({"status": "alive"}) @app.post("/predict") async def predict(data: dict): """Agent inference endpoint""" if not model_loaded: return JSONResponse( {"error": "Model still loading, try again soon"}, status_code=503 ) # Do actual inference return {"prediction": "example output"}

Terminal Testing:

bash
# First request during loading curl http://localhost:8000/health/ready # Output: {"status": "loading", "ready": false} # After 15 seconds, try again curl http://localhost:8000/health/ready # Output: {"status": "ready", "ready": true} # Liveness endpoint always responds curl http://localhost:8000/health/live # Output: {"status": "alive"}

Deployment with HTTP Readiness and Liveness Probes

yaml
apiVersion: apps/v1 kind: Deployment metadata: name: agent-deployment spec: replicas: 2 selector: matchLabels: app: agent template: metadata: labels: app: agent spec: containers: - name: agent image: my-agent:v1 ports: - containerPort: 8000 name: http # Wait for model to load before sending traffic readinessProbe: httpGet: path: /health/ready port: http initialDelaySeconds: 5 # Check after 5s periodSeconds: 5 # Check every 5s timeoutSeconds: 2 # Must complete in 2s failureThreshold: 2 # 2 failures = not ready # Detect if container is stuck or crashed livenessProbe: httpGet: path: /health/live port: http initialDelaySeconds: 20 # Wait for startup periodSeconds: 10 # Check every 10s timeoutSeconds: 2 failureThreshold: 3 # 3 failures = restart # Buy time for initialization startupProbe: httpGet: path: /health/ready port: http periodSeconds: 5 failureThreshold: 6 # Allow 30s for startup

Execution Results:

bash
kubectl apply -f deployment.yaml # Watch pod startup kubectl get pods -w # agent-deployment-7d4f5c6b9f-abc123 0/1 Running 0 # At 15s: startup probe succeeds (model loaded) # At 20s: # agent-deployment-7d4f5c6b9f-abc123 1/1 Running 0 # View probe events kubectl describe pod agent-deployment-7d4f5c6b9f-abc123 # Events: # Normal Created 45s kubelet Created container agent # Normal Started 45s kubelet Started container agent # Warning Unhealthy 40s kubelet Readiness probe failed: HTTP probe failed # Normal Ready 30s kubelet Container is ready

TCP Socket Probes (Port Availability)

Use TCP probes when you don't have an HTTP endpoint, or for protocols like gRPC/database connections.

TCP Probe Configuration

yaml
spec: containers: - name: cache image: redis:7 ports: - containerPort: 6379 livenessProbe: tcpSocket: port: 6379 initialDelaySeconds: 10 periodSeconds: 10 timeoutSeconds: 1 failureThreshold: 3

How it works:

  • Kubernetes attempts a TCP connection to the specified port.
  • Success: Port is open and accepting connections.
  • Timeout/failure: Port is not responding.

Exec Probes (Custom Commands)

Run arbitrary shell commands for custom health checks. Exit code 0 = healthy.

Exec Probe Example

yaml
spec: containers: - name: agent image: my-agent:v1 readinessProbe: exec: command: - /bin/sh - -c - | # Check if critical files exist and are recent [ -f /app/model.pkl ] && \ [ $(find /app/model.pkl -mmin -2) ] && \ curl -sf http://localhost:8000/health/ready > /dev/null initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 2

Timing Parameters (Critical Decisions)

ParameterPurposeBest Practice for AI Agents
initialDelaySecondsWait before the first check.Set to 1.5x actual model load time if NOT using startup probes.
periodSecondsFrequency of health checks.Readiness: 5-10s (quick removal). Liveness: 10-30s (less aggressive).
timeoutSecondsWait time for a response.Usually 1-2s. Increase if model inference is slow but part of readiness.
failureThresholdFailures before taking action.Startup: 10+ (very tolerant). Liveness: 3-5 (moderate).

Watch out: If timeoutSeconds >= periodSeconds, you get overlapping probes and unpredictable behavior.


Debugging Probe Failures

StepCommandWhat to Look For
1. Statuskubectl get podsREADY 0/1 means readiness failure. High RESTARTS means liveness failure.
2. Describekubectl describe pod <name>Check the Events section for exact probe failure strings.
3. Logskubectl logs <name>Look for app-side errors during model loading or initialization.
4. Manual Testkubectl port-forward <name> 8000:8000Manually curl the health endpoints to verify their logic.

AI-Native Health Checks

Consider this pattern for agents with variable startup times:

yaml
apiVersion: apps/v1 kind: Deployment metadata: name: ai-agent spec: template: spec: containers: - name: agent image: my-agent:latest startupProbe: httpGet: { path: /health/startup, port: 8000 } periodSeconds: 5 failureThreshold: 30 # Allow 150s for startup readinessProbe: httpGet: { path: /health/ready, port: 8000 } periodSeconds: 5 failureThreshold: 2 livenessProbe: httpGet: { path: /health/live, port: 8000 } periodSeconds: 10 failureThreshold: 3

Try With AI

Setup: You have a containerized agent with a health endpoint.

Challenge: Configure probes for an agent with these characteristics:

  • Model loads in 20 seconds.
  • Health endpoint responds in under 500ms when healthy.
  • You want aggressive failure detection for readiness (pod removed quickly if unhealthy).
  • You want conservative failure detection for liveness (don't restart on transient errors).

Action Prompt: "Write a Deployment manifest with startup, readiness, and liveness probes for an agent that takes 20s to load. Ensure readiness is aggressive (5s period) and liveness is conservative (30s period)."


Reflect on Your Skill

You built a kubernetes-deployment skill in Chapter 0. Test and improve it based on what you learned.

Test Your Skill

bash
Using my kubernetes-deployment skill, configure liveness, readiness, and startup probes. Does my skill generate proper probe configurations with appropriate timing parameters?

Identify Gaps

Ask yourself:

  • Did my skill include the three probe types (liveness, readiness, startup) and when to use each?
  • Did it explain HTTP GET, TCP socket, and exec probe mechanisms?
  • Did it cover debugging patterns for probe failures?

Improve Your Skill

If you found gaps:

bash
My kubernetes-deployment skill is missing health probe configuration and debugging patterns. Update it to include all three probe types, appropriate timing for AI agents, and probe failure diagnosis steps.