Kubernetes Security for AI Services

Name: Digital FTEs: Engineering — Achieving 10× Productivity
Author: Muhammad Usman Akbar

Your FastAPI agent is now deployed to Kubernetes (Chapter 3). It's running, scaling, exposed to traffic. But here's the question: who can access what?

In a production cluster, your agent container might handle sensitive data—user conversations, API keys, model weights. A compromised container could leak everything. This lesson builds the security foundations that protect your agent in production: non-root execution, read-only filesystems, network isolation, and vulnerability scanning.

By the end, your agent will run with minimal privileges, reject requests from unauthorized namespaces, and expose zero unnecessary attack surface.

The Security Specification

Before we write a single YAML line, let's define what "secure" means for your agent.

Security Intent: AI agent handling sensitive user data must not run as root, must have read-only filesystem, must be isolated from other namespaces, and must reject unauthorized network traffic.

Success Criteria:

✅ Container runs as non-root user (UID > 1000)
✅ Root filesystem is read-only
✅ Container cannot gain elevated privileges
✅ Agent Pod only receives traffic from authorized namespaces
✅ Pod adheres to Restricted Pod Security Standard

Constraints:

Must preserve application functionality (logging, tmp files still work)
Container must have specific user ID pre-built into the image
Network Policies require matching labels for routing

Non-Goals:

Encrypting data at rest (handled by encrypted persistent volumes)
Secret rotation automation (handled by external secret managers)
Pod-to-Pod TLS encryption (handled by service mesh)

SecurityContext: Running as Non-Root

The first line of defense is SecurityContext—a Kubernetes configuration that controls how a container runs at the OS level.

Why Non-Root Matters

By default, containers inherit the permissions of the user who created the image. In many base images (Python, Node), that user is root. This means:

text

Root container compromised → Attacker has root on the entire container
                  → Attacker can modify any file in the image
                  → Attacker can modify the kernel

Running as non-root doesn't prevent compromise, but it limits what an attacker can do after gaining access.

Building a Secure Image

First, create an image with a non-root user. In your Dockerfile:

dockerfile

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

# Create a non-root user
RUN useradd -m -u 1001 agentuser
# Set working directory permissions
RUN chown -R agentuser:agentuser /app

USER agentuser
COPY --chown=agentuser:agentuser . .
CMD ["python", "agent.py"]

Key points:

useradd -m -u 1001 agentuser creates a user with ID 1001 (non-root, > 1000)
chown transfers ownership to the new user
USER agentuser makes the container run as this user by default
--chown flag on COPY preserves ownership during the build

Build and push this image:

bash

docker build -t myregistry/agent:v1-secure .
docker push myregistry/agent:v1-secure

Output:

text

Successfully built sha256:abc123...
Successfully tagged myregistry/agent:v1-secure
The push refers to repository [myregistry/agent]
v1-secure: digest: sha256:def456... size: 45MB

Enforcing Non-Root with SecurityContext

Even if your image runs as a non-root user, Kubernetes can enforce it with SecurityContext. This prevents accidentally running a container as root.

yaml

apiVersion: v1
kind: Pod
metadata:
  name: agent-secure
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
    fsGroup: 1001
  containers:
  - name: agent
    image: myregistry/agent:v1-secure
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: false
      capabilities:
        drop:
          - ALL

Read-Only Root Filesystem

The second layer: read-only root filesystem. An attacker who gains code execution inside the container can modify files on disk, install backdoors, or change the application logic. A read-only filesystem blocks this attack vector.

Understanding Filesystem Layers

When you mount a read-only filesystem, the application can still write to specific locations using emptyDir volumes. These are temporary, per-Pod directories that disappear when the Pod restarts—perfect for logs, temp files, and caches.

Configuring Read-Only Root with Writable Volume

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: agent
  template:
    metadata:
      labels:
        app: agent
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
      containers:
      - name: agent
        image: myregistry/agent:v1-secure
        securityContext:
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
        volumeMounts:
        - name: tmp-volume
          mountPath: /tmp
        - name: logs-volume
          mountPath: /var/log/agent
        ports:
        - containerPort: 8000
      volumes:
      - name: tmp-volume
        emptyDir: {}
      - name: logs-volume
        emptyDir: {}

Network Policies: Isolating Agent Traffic

The third layer: Network Policies. By default, Kubernetes allows all Pods to communicate with each other. A compromised Pod in one namespace could reach Pods in another.

Network Policies enforce segmentation: your agent only receives traffic from authorized namespaces and services.

Denying All Traffic, Then Allowing Specific Sources

yaml

# Step 1: Deny all ingress traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-default-deny
  namespace: agents
spec:
  podSelector: {}
  policyTypes:
  - Ingress

---
# Step 2: Allow traffic only from the api-gateway in the ingress namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-allow-from-gateway
  namespace: agents
spec:
  podSelector:
    matchLabels:
      app: agent
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress
      podSelector:
        matchLabels:
          app: api-gateway
    ports:
    - protocol: TCP
      port: 8000

---
# Step 3: Allow agent Pods to reach external APIs
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-allow-egress
  namespace: agents
spec:
  podSelector:
    matchLabels:
      app: agent
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 443

First, make sure your namespaces are labeled:

bash

kubectl label namespace ingress name=ingress
kubectl label namespace agents name=agents

Output:

text

namespace/ingress labeled
namespace/agents labeled

Apply the Network Policies:

bash

kubectl apply -f agent-network-policy.yaml

Output:

text

networkpolicy.networking.k8s.io/agent-default-deny created
networkpolicy.networking.k8s.io/agent-allow-from-gateway created
networkpolicy.networking.k8s.io/agent-allow-egress created

Test the policy by trying to reach the agent from an unauthorized Pod:

bash

# From a different namespace without permission
kubectl run test-pod --image=curlimages/curl -n other -- sleep 3600
kubectl exec test-pod -n other -- curl http://agent-service.agents.svc.cluster.local:8000

Output:

text

command terminated with exit code 1

The connection is denied. But traffic from the authorized gateway succeeds:

bash

# From the authorized api-gateway Pod
kubectl exec deployment/api-gateway -n ingress -- curl http://agent-service.agents.svc.cluster.local:8000/health

Output:

text

{"status": "healthy", "model": "gpt-4o", "uptime_seconds": 3600}

Pod Security Standards

Kubernetes provides Pod Security Standards—three tiers that codify security best practices. This lesson's agent should adhere to the Restricted standard.

The Three Standards

Standard	Use Case	Restrictions
Privileged	System components needing OS access	None—allows everything
Baseline	General-purpose applications	Disallows privileged containers, host access
Restricted	High-security applications (agents, APIs handling sensitive data)	Requires non-root, read-only, no privilege escalation, drop all capabilities

Enforcing Pod Security Standards

Label your namespace to enforce the Restricted standard:

bash

kubectl label namespace agents pod-security.kubernetes.io/enforce=restricted

Output:

text

namespace/agents labeled

Now, any Pod in the agents namespace that violates Restricted standards is rejected.

Container Image Security: Vulnerability Scanning

The fourth layer: image security. Before your container runs, scan it for known vulnerabilities in dependencies.

Tools for Scanning

Tool	Approach	Best For
Trivy (Aqua Security)	Container image scanning	Local development, CI/CD pipelines
Grype (Anchore)	Vulnerability database	Supply chain security
Snyk	SaaS scanning	Developer-first security
Harbor	Registry integration	Preventing vulnerable images from being pushed

Scanning Your Agent Image

Install Trivy (or use Docker image):

bash

trivy image myregistry/agent:v1-secure

Output:

text

myregistry/agent:v1-secure (linux/amd64)
==================================
Total: 8 Vulnerabilities
┌──────────────────────────┼──────────────────┼──────────┐
│ Library                  │ Vulnerability ID │ Severity │
├──────────────────────────┼──────────────────┼──────────┤
│ libssl1.1                │ CVE-2023-2976    │ MEDIUM   │
│ libcrypto1.1             │ CVE-2023-3817    │ MEDIUM   │
│ pip packages             │ CVE-2024-123     │ LOW      │
└──────────────────────────┴──────────────────┴──────────┘

Upgrading base packages often resolves vulnerabilities.

dockerfile

FROM python:3.11-slim
RUN apt-get update && apt-get upgrade -y

Integration into CI/CD

Add a scanning step to your deployment pipeline:

bash

# Before pushing to registry
trivy image --severity HIGH,CRITICAL myregistry/agent:v1
if [ $? -ne 0 ]; then
  echo "High/critical vulnerabilities found. Fix before deploying."
  exit 1
fi
docker push myregistry/agent:v1

Putting It All Together: A Secure Agent Deployment

Here's the complete, security-hardened Deployment combining all layers:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-prod
  namespace: agents
spec:
  replicas: 3
  selector:
    matchLabels:
      app: agent
  template:
    metadata:
      labels:
        app: agent
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
        fsGroup: 1001
      containers:
      - name: agent
        image: myregistry/agent:v1-secure
        securityContext:
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 1Gi
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: logs
          mountPath: /var/log/agent
        ports:
        - containerPort: 8000
          name: http
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 10
      volumes:
      - name: tmp
        emptyDir: {}
      - name: logs
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: agent-service
  namespace: agents
spec:
  selector:
    app: agent
  type: ClusterIP
  ports:
  - port: 8000
    targetPort: 8000
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-deny-all
  namespace: agents
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-allow-ingress
  namespace: agents
spec:
  podSelector:
    matchLabels:
      app: agent
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress

Deploy everything:

bash

kubectl apply -f agent-production.yaml

Output:

text

deployment.apps/agent-prod created
service/agent-service created
networkpolicy.networking.k8s.io/agent-deny-all created
networkpolicy.networking.k8s.io/agent-allow-ingress created

Verify security settings:

bash

kubectl get pod -n agents -o jsonpath='{.items[0].spec.security
Context}'

Output:

text

{"fs
Group":1001,"run
AsNon
Root":true,"run
AsUser":1001}

Try With AI

Audit an Existing Deployment for Security

Describe your current agent Deployment:

text

I have a Deployment running a FastAPI agent in Kubernetes.
The Pod spec looks like:
spec:
  containers:
  - name: agent
    image: myregistry/agent:v1.0
    # No securityContext configured

Help me identify security gaps and prioritize fixes based on impact.