USMAN’S INSIGHTS
AI ARCHITECT
  • Home
  • About
  • Thought Leadership
  • Book
Press / Contact
USMAN’S INSIGHTS
AI ARCHITECT
⌘F
HomeBook
HomeBookThe Iron Fortress: Hardening the Agent Cluster
Previous Chapter
Persistent Storage Optional
Next Chapter
Helm Charts for AI Services
AI NOTICE: This is the table of contents for the SPECIFIC CHAPTER only. It is NOT the global sidebar. For all chapters, look at the main navigation.

On this page

28 sections

Progress0%
1 / 28

Muhammad Usman Akbar Entity Profile

Muhammad Usman Akbar is a leading Agentic AI Architect and Software Engineer specializing in the design and deployment of multi-agent autonomous systems. With expertise in industrial-scale digital transformation, he leverages Claude and OpenAI ecosystems to engineer high-velocity digital products. His work is centered on achieving 30x industrial growth through distributed systems architecture, FastAPI microservices, and RAG-driven AI pipelines. Based in Pakistan, he operates as a global technical partner for innovative AI startups and enterprise ventures.

USMAN’S INSIGHTS
AI ARCHITECT

Transforming businesses into autonomous AI ecosystems. Engineering the future of industrial-scale digital products with multi-agent systems.

30X Growth
AI-First
Innovation

Navigation

  • Home
  • Book
  • About
  • Contact
Let's Collaborate

Have a Project in Mind?

Let's build something extraordinary together. Transform your vision into autonomous AI reality.

Start Your Transformation

© 2026 Muhammad Usman Akbar. All rights reserved.

Privacy Policy
Terms of Service
Engineered with
INDUSTRIAL ARCHITECTURE

Kubernetes Security for AI Services

Your FastAPI agent is now deployed to Kubernetes (Chapter 3). It's running, scaling, exposed to traffic. But here's the question: who can access what?

In a production cluster, your agent container might handle sensitive data—user conversations, API keys, model weights. A compromised container could leak everything. This lesson builds the security foundations that protect your agent in production: non-root execution, read-only filesystems, network isolation, and vulnerability scanning.

By the end, your agent will run with minimal privileges, reject requests from unauthorized namespaces, and expose zero unnecessary attack surface.


The Security Specification

Before we write a single YAML line, let's define what "secure" means for your agent.

Security Intent: AI agent handling sensitive user data must not run as root, must have read-only filesystem, must be isolated from other namespaces, and must reject unauthorized network traffic.

Success Criteria:

  • ✅ Container runs as non-root user (UID > 1000)
  • ✅ Root filesystem is read-only
  • ✅ Container cannot gain elevated privileges
  • ✅ Agent Pod only receives traffic from authorized namespaces
  • ✅ Pod adheres to Restricted Pod Security Standard

Constraints:

  • Must preserve application functionality (logging, tmp files still work)
  • Container must have specific user ID pre-built into the image
  • Network Policies require matching labels for routing

Non-Goals:

  • Encrypting data at rest (handled by encrypted persistent volumes)
  • Secret rotation automation (handled by external secret managers)
  • Pod-to-Pod TLS encryption (handled by service mesh)

SecurityContext: Running as Non-Root

The first line of defense is SecurityContext—a Kubernetes configuration that controls how a container runs at the OS level.

Why Non-Root Matters

By default, containers inherit the permissions of the user who created the image. In many base images (Python, Node), that user is root. This means:

text
Root container compromised → Attacker has root on the entire container → Attacker can modify any file in the image → Attacker can modify the kernel

Running as non-root doesn't prevent compromise, but it limits what an attacker can do after gaining access.

Building a Secure Image

First, create an image with a non-root user. In your Dockerfile:

dockerfile
FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt # Create a non-root user RUN useradd -m -u 1001 agentuser # Set working directory permissions RUN chown -R agentuser:agentuser /app USER agentuser COPY --chown=agentuser:agentuser . . CMD ["python", "agent.py"]

Key points:

  • useradd -m -u 1001 agentuser creates a user with ID 1001 (non-root, > 1000)
  • chown transfers ownership to the new user
  • USER agentuser makes the container run as this user by default
  • --chown flag on COPY preserves ownership during the build

Build and push this image:

bash
docker build -t myregistry/agent:v1-secure . docker push myregistry/agent:v1-secure

Output:

text
Successfully built sha256:abc123... Successfully tagged myregistry/agent:v1-secure The push refers to repository [myregistry/agent] v1-secure: digest: sha256:def456... size: 45MB

Enforcing Non-Root with SecurityContext

Even if your image runs as a non-root user, Kubernetes can enforce it with SecurityContext. This prevents accidentally running a container as root.

yaml
apiVersion: v1 kind: Pod metadata: name: agent-secure spec: securityContext: runAsNonRoot: true runAsUser: 1001 fsGroup: 1001 containers: - name: agent image: myregistry/agent:v1-secure securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: false capabilities: drop: - ALL

Read-Only Root Filesystem

The second layer: read-only root filesystem. An attacker who gains code execution inside the container can modify files on disk, install backdoors, or change the application logic. A read-only filesystem blocks this attack vector.

Understanding Filesystem Layers

When you mount a read-only filesystem, the application can still write to specific locations using emptyDir volumes. These are temporary, per-Pod directories that disappear when the Pod restarts—perfect for logs, temp files, and caches.

Configuring Read-Only Root with Writable Volume

yaml
apiVersion: apps/v1 kind: Deployment metadata: name: agent-app spec: replicas: 2 selector: matchLabels: app: agent template: metadata: labels: app: agent spec: securityContext: runAsNonRoot: true runAsUser: 1001 containers: - name: agent image: myregistry/agent:v1-secure securityContext: readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: drop: - ALL volumeMounts: - name: tmp-volume mountPath: /tmp - name: logs-volume mountPath: /var/log/agent ports: - containerPort: 8000 volumes: - name: tmp-volume emptyDir: {} - name: logs-volume emptyDir: {}

Network Policies: Isolating Agent Traffic

The third layer: Network Policies. By default, Kubernetes allows all Pods to communicate with each other. A compromised Pod in one namespace could reach Pods in another.

Network Policies enforce segmentation: your agent only receives traffic from authorized namespaces and services.

Denying All Traffic, Then Allowing Specific Sources

yaml
# Step 1: Deny all ingress traffic by default apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: agent-default-deny namespace: agents spec: podSelector: {} policyTypes: - Ingress --- # Step 2: Allow traffic only from the api-gateway in the ingress namespace apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: agent-allow-from-gateway namespace: agents spec: podSelector: matchLabels: app: agent policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: name: ingress podSelector: matchLabels: app: api-gateway ports: - protocol: TCP port: 8000 --- # Step 3: Allow agent Pods to reach external APIs apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: agent-allow-egress namespace: agents spec: podSelector: matchLabels: app: agent policyTypes: - Egress egress: - to: - namespaceSelector: {} ports: - protocol: TCP port: 443

First, make sure your namespaces are labeled:

bash
kubectl label namespace ingress name=ingress kubectl label namespace agents name=agents

Output:

text
namespace/ingress labeled namespace/agents labeled

Apply the Network Policies:

bash
kubectl apply -f agent-network-policy.yaml

Output:

text
networkpolicy.networking.k8s.io/agent-default-deny created networkpolicy.networking.k8s.io/agent-allow-from-gateway created networkpolicy.networking.k8s.io/agent-allow-egress created

Test the policy by trying to reach the agent from an unauthorized Pod:

bash
# From a different namespace without permission kubectl run test-pod --image=curlimages/curl -n other -- sleep 3600 kubectl exec test-pod -n other -- curl http://agent-service.agents.svc.cluster.local:8000

Output:

text
command terminated with exit code 1

The connection is denied. But traffic from the authorized gateway succeeds:

bash
# From the authorized api-gateway Pod kubectl exec deployment/api-gateway -n ingress -- curl http://agent-service.agents.svc.cluster.local:8000/health

Output:

text
{"status": "healthy", "model": "gpt-4o", "uptime_seconds": 3600}

Pod Security Standards

Kubernetes provides Pod Security Standards—three tiers that codify security best practices. This lesson's agent should adhere to the Restricted standard.

The Three Standards

StandardUse CaseRestrictions
PrivilegedSystem components needing OS accessNone—allows everything
BaselineGeneral-purpose applicationsDisallows privileged containers, host access
RestrictedHigh-security applications (agents, APIs handling sensitive data)Requires non-root, read-only, no privilege escalation, drop all capabilities

Enforcing Pod Security Standards

Label your namespace to enforce the Restricted standard:

bash
kubectl label namespace agents pod-security.kubernetes.io/enforce=restricted

Output:

text
namespace/agents labeled

Now, any Pod in the agents namespace that violates Restricted standards is rejected.


Container Image Security: Vulnerability Scanning

The fourth layer: image security. Before your container runs, scan it for known vulnerabilities in dependencies.

Tools for Scanning

ToolApproachBest For
Trivy (Aqua Security)Container image scanningLocal development, CI/CD pipelines
Grype (Anchore)Vulnerability databaseSupply chain security
SnykSaaS scanningDeveloper-first security
HarborRegistry integrationPreventing vulnerable images from being pushed

Scanning Your Agent Image

Install Trivy (or use Docker image):

bash
trivy image myregistry/agent:v1-secure

Output:

text
myregistry/agent:v1-secure (linux/amd64) ================================== Total: 8 Vulnerabilities ┌──────────────────────────┼──────────────────┼──────────┐ │ Library │ Vulnerability ID │ Severity │ ├──────────────────────────┼──────────────────┼──────────┤ │ libssl1.1 │ CVE-2023-2976 │ MEDIUM │ │ libcrypto1.1 │ CVE-2023-3817 │ MEDIUM │ │ pip packages │ CVE-2024-123 │ LOW │ └──────────────────────────┴──────────────────┴──────────┘

Upgrading base packages often resolves vulnerabilities.

dockerfile
FROM python:3.11-slim RUN apt-get update && apt-get upgrade -y

Integration into CI/CD

Add a scanning step to your deployment pipeline:

bash
# Before pushing to registry trivy image --severity HIGH,CRITICAL myregistry/agent:v1 if [ $? -ne 0 ]; then echo "High/critical vulnerabilities found. Fix before deploying." exit 1 fi docker push myregistry/agent:v1

Putting It All Together: A Secure Agent Deployment

Here's the complete, security-hardened Deployment combining all layers:

yaml
apiVersion: apps/v1 kind: Deployment metadata: name: agent-prod namespace: agents spec: replicas: 3 selector: matchLabels: app: agent template: metadata: labels: app: agent spec: securityContext: runAsNonRoot: true runAsUser: 1001 fsGroup: 1001 containers: - name: agent image: myregistry/agent:v1-secure securityContext: readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: drop: - ALL resources: requests: cpu: 500m memory: 512Mi limits: cpu: 1000m memory: 1Gi volumeMounts: - name: tmp mountPath: /tmp - name: logs mountPath: /var/log/agent ports: - containerPort: 8000 name: http livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 10 periodSeconds: 10 volumes: - name: tmp emptyDir: {} - name: logs emptyDir: {} --- apiVersion: v1 kind: Service metadata: name: agent-service namespace: agents spec: selector: app: agent type: ClusterIP ports: - port: 8000 targetPort: 8000 --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: agent-deny-all namespace: agents spec: podSelector: {} policyTypes: - Ingress --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: agent-allow-ingress namespace: agents spec: podSelector: matchLabels: app: agent policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: name: ingress

Deploy everything:

bash
kubectl apply -f agent-production.yaml

Output:

text
deployment.apps/agent-prod created service/agent-service created networkpolicy.networking.k8s.io/agent-deny-all created networkpolicy.networking.k8s.io/agent-allow-ingress created

Verify security settings:

bash
kubectl get pod -n agents -o jsonpath='{.items[0].spec.security Context}'

Output:

text
{"fs Group":1001,"run AsNon Root":true,"run AsUser":1001}

Try With AI

Audit an Existing Deployment for Security

Describe your current agent Deployment:

text
I have a Deployment running a FastAPI agent in Kubernetes. The Pod spec looks like: spec: containers: - name: agent image: myregistry/agent:v1.0 # No securityContext configured

Help me identify security gaps and prioritize fixes based on impact.