Production Hardening

Name: Digital FTEs: Engineering — Achieving 10× Productivity
Author: Muhammad Usman Akbar

Your container builds and runs. The image is optimized with multi-stage builds. But production environments demand more than a working container—they require security, observability, and resilience.

Consider what happens when your containerized FastAPI agent service goes to production. Kubernetes needs to know if your service is healthy before routing traffic to it. If your container runs as root, a security vulnerability could give an attacker full control of the host. If configuration is hardcoded, you'll need to rebuild images for every environment (dev, staging, production).

Production hardening addresses these concerns through three patterns: environment variable configuration (flexibility), health checks (observability), and non-root users (security). These aren't optional extras—they're requirements for any serious deployment. In this lesson, you'll add each pattern to your Dockerfile, understand why it matters, and end up with a production-ready container template you'll use for every AI service you build.

The Three Pillars of Production Hardening

Before diving into implementation, understand what we're solving:

Pillar	Problem	Solution
Configuration	Hardcoded values require image rebuilds	Environment variables at runtime
Observability	Orchestrators can't detect unhealthy containers	Health check endpoints + HEALTHCHECK instruction
Security	Root containers enable privilege escalation	Non-root user execution

Each pillar is independent but together they form the foundation of production-ready containers. Let's implement each one.

Environment Variables for Configuration

Hardcoded configuration creates fragile containers. If your Dockerfile specifies LOG_LEVEL=INFO, you need a new image for debug logging. If API_HOST=production.example.com, you can't run locally.

Docker provides two instructions for configuration:

ARG: Build-time variables (available during docker build, NOT in running container)
ENV: Runtime variables (available when container runs)

Understanding ENV

The ENV instruction sets environment variables that persist into the running container:

dockerfile

ENV PYTHONUNBUFFERED=1
ENV LOG_LEVEL=INFO
ENV API_HOST=0.0.0.0
ENV API_PORT=8000

Your application reads these values at runtime.

Update main.py:

python

import os
from fastapi import FastAPI

app = FastAPI()

LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO")
API_HOST = os.getenv("API_HOST", "0.0.0.0")

@app.get("/")
def read_root():
    return {"message": "Hello from Docker!", "log_level": LOG_LEVEL}

@app.get("/health")
def health_check():
    return {"status": "healthy"}

Output:

bash

$ docker run -p 8000:8000 my-app:latest
INFO:     Uvicorn running on http://0.0.0.0:8000

$ curl http://localhost:8000/
{"message":"Hello from Docker!","log_level":"INFO"}

Overriding ENV at Runtime

The -e flag overrides environment variables when starting a container:

Specification

docker run -e LOG_LEVEL=DEBUG -e API_PORT=9000 -p 9000:9000 my-app:latest

Output:

bash

$ docker run -e LOG_LEVEL=DEBUG -p 8000:8000 my-app:latest
INFO:     Uvicorn running on http://0.0.0.0:8000

$ curl http://localhost:8000/
{"message":"Hello from Docker!","log_level":"DEBUG"}

The container uses DEBUG instead of the default INFO without rebuilding the image.

Understanding ARG

ARG defines build-time variables. They're available during docker build but not when the container runs:

dockerfile

ARG PYTHON_VERSION=3.12
FROM python:${PYTHON_VERSION}-alpine

# ARG value is accessible here during build
RUN echo "Building with Python ${PYTHON_VERSION}"

# But NOT accessible at runtime
# The following would fail because ARG is gone after build:
# CMD ["echo", "${PYTHON_VERSION}"]

Build with different Python versions:

bash

docker build --build-arg PYTHON_VERSION=3.11 -t my-app:py311 .
docker build --build-arg PYTHON_VERSION=3.12 -t my-app:py312 .

Output:

text

$ docker build --build-arg PYTHON_VERSION=3.11 -t my-app:py311 .
[+] Building 2.1s (8/8) FINISHED
=> [1/4] FROM docker.io/library/python:3.11-alpine
=> [2/4] RUN echo "Building with Python 3.11"
Building with Python 3.11...

When to Use ARG vs ENV

Use Case	Instruction	Example
Python version for base image	ARG	ARG PYTHON_VERSION=3.12
Log level for running app	ENV	ENV LOG_LEVEL=INFO
Git commit hash for image tag	ARG	ARG GIT_SHA
API keys (runtime secrets)	ENV (via -e)	-e API_KEY=abc123
Package versions during build	ARG	ARG UV_VERSION=0.4.0
Feature flags in running container	ENV	ENV ENABLE_CACHING=true

Key distinction: If you need the value when the container RUNS, use ENV. If you only need it during BUILD, use ARG.

Health Check Implementation

Orchestrators like Kubernetes need to know if your container is healthy. A container can be "running" but completely broken—the process exists but crashes on every request. Health checks detect this.

Adding a Health Endpoint to FastAPI

First, ensure your FastAPI service has a health endpoint.

Update main.py:

python

from fastapi import FastAPI
import os

app = FastAPI()

@app.get("/")
def read_root():
    return {"message": "Hello from Docker!"}

@app.get("/health")
def health_check():
    """Health check endpoint for Docker HEALTHCHECK and Kubernetes probes."""
    return {"status": "healthy"}

Test the endpoint:

bash

uvicorn main:app --host 0.0.0.0 --port 8000 &
curl http://localhost:8000/health

Output:

json

$ curl http://localhost:8000/health
{"status":"healthy"}

Docker HEALTHCHECK Instruction

The HEALTHCHECK instruction tells Docker how to verify container health:

dockerfile

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:8000/health || exit 1

Let's break down each component:

Parameter	Value	Meaning
--interval=30s	30 seconds	How often to run the check
--timeout=10s	10 seconds	How long to wait for response
--start-period=5s	5 seconds	Grace period for container startup
--retries=3	3 attempts	Failed checks before marking unhealthy
CMD	wget command	The actual health check command

Why wget instead of curl?

Alpine images include wget by default but not curl. Using wget avoids adding dependencies. For slim-based images, use curl:

dockerfile

# For Alpine-based images (wget built-in):
HEALTHCHECK CMD wget --no-verbose --tries=1 --spider http://localhost:8000/health || exit 1

# For slim-based images (curl built-in):
HEALTHCHECK CMD curl --fail http://localhost:8000/health || exit 1

Verifying Health Check Status

Build and run a container with the health check:

bash

docker build -t health-app:latest .
docker run -d --name health-test -p 8000:8000 health-app:latest

Wait 30 seconds for the first health check, then inspect:

Specification

docker inspect --format='{{json .State.Health}}' health-test | python -m json.tool

Output:

json

{
    "Status": "healthy",
    "FailingStreak": 0,
    "Log": [
        {
            "Start": "2024-12-27T10:30:00.123456Z",
            "End": "2024-12-27T10:30:00.234567Z",
            "ExitCode": 0,
            "Output": ""
        }
    ]
}

The "Status": "healthy" confirms the health check passed. If the endpoint fails, status becomes "unhealthy" and FailingStreak increments.

Health Check Status in docker ps

Specification

docker ps

Output:

text

CONTAINER ID   IMAGE             COMMAND                  STATUS                    PORTS
a1b2c3d4e5f6   health-app:latest "uvicorn main:app..."   Up 2 minutes (healthy)   0.0.0.0:8000->8000/tcp

Notice (healthy) in the STATUS column. This is how you quickly verify container health.

Clean up:

bash

docker stop health-test && docker rm health-test

Non-Root User Security

By default, Docker containers run as root. If an attacker exploits a vulnerability in your application, they have root access to the container—and potentially the host.

Running as a non-root user limits damage. Even if compromised, the attacker has limited privileges.

Creating a Non-Root User

Add a dedicated user in your Dockerfile:

dockerfile

# Create non-root user with specific UID
RUN adduser -D -u 1000 appuser

The flags:

-D: Don't assign a password (non-interactive)
-u 1000: Assign user ID 1000 (conventional for app users)
appuser: Username

Switching to Non-Root User

After creating the user, switch to it with USER:

dockerfile

# Create user
RUN adduser -D -u 1000 appuser

# ... copy files with ownership ...

# Switch to non-root user BEFORE CMD
USER appuser
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Copying Files with Correct Ownership

Files copied into the container are owned by root by default. Use --chown to set ownership:

dockerfile

# Copy with ownership set to appuser
COPY --chown=appuser:appuser main.py .

Without --chown, the non-root user can't read the files:

bash

# Without --chown:
$ docker exec my-container ls -la /app/main.py
-rw-r--r-- 1 root root 237 Dec 27 10:00 main.py  # Owned by root

# With --chown:
$ docker exec my-container ls -la /app/main.py
-rw-r--r-- 1 appuser appuser 237 Dec 27 10:00 main.py  # Owned by appuser

Verifying Non-Root Execution

Build and run, then check who's running the process:

bash

docker build -t secure-app:latest .
docker run -d --name secure-test secure-app:latest
docker exec secure-test whoami

Output:

text

$ docker exec secure-test whoami
appuser

Not root. The container runs with limited privileges.

Clean up:

bash

docker stop secure-test && docker rm secure-test

The Production Dockerfile Template

Combining all three pillars—configuration, health checks, and non-root user—creates a production-ready template:

dockerfile

# Stage 1: Build
FROM python:3.12-alpine AS builder
WORKDIR /app

# Install UV for fast dependency installation
RUN pip install uv
COPY requirements.txt .

# Install dependencies with UV
RUN uv pip install --system --no-cache -r requirements.txt

# Stage 2: Runtime
FROM python:3.12-alpine

# Create non-root user
RUN adduser -D -u 1000 appuser
WORKDIR /app

# Copy dependencies from builder
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

# Copy application code with ownership
COPY --chown=appuser:appuser main.py .

# Environment configuration
ENV PYTHONUNBUFFERED=1
ENV LOG_LEVEL=INFO

# Switch to non-root user
USER appuser

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:8000/health || exit 1

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

This template includes:

Multi-stage build (from Chapter 6)
UV for fast dependency installation
Non-root user (appuser)
Correct file ownership (--chown)
Environment variable defaults
Health check with appropriate timing
Exposed port documentation

Building and Testing the Complete Template

Create the required files.

Create requirements.txt:

text

fastapi==0.115.0
uvicorn==0.30.0

Create main.py:

python

import os
from fastapi import FastAPI

app = FastAPI()

LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO")

@app.get("/")
def read_root():
    return {"message": "Production hardened!", "log_level": LOG_LEVEL}

@app.get("/health")
def health_check():
    return {"status": "healthy"}

Build and run:

bash

docker build -t production-app:latest .
docker run -d --name prod-test -p 8000:8000 production-app:latest

Output:

text

$ docker build -t production-app:latest .
[+] Building 12.3s (14/14) FINISHED
=> [builder 1/4] FROM python:3.12-alpine
=> [builder 2/4] WORKDIR /app
=> [builder 3/4] RUN pip install uv
=> [builder 4/4] RUN uv pip install --system --no-cache -r requirements.txt
=> [stage-1 1/6] FROM python:3.12-alpine
=> [stage-1 2/6] RUN adduser -D -u 1000 appuser
=> [stage-1 3/6] WORKDIR /app
=> [stage-1 4/6] COPY --from=builder /usr/local/lib/python3.12/site-packages...
=> [stage-1 5/6] COPY --from=builder /usr/local/bin /usr/local/bin
=> [stage-1 6/6] COPY --chown=appuser:appuser main.py .
=> exporting to image
Successfully tagged production-app:latest

Verify all three pillars:

bash

#

1. Configuration: Override LOG_LEVEL
docker run --rm -e LOG_LEVEL=DEBUG production-app:latest sh -c 'echo $LOG_LEVEL'

Output:

Specification

DEBUG

bash

#

2. Health check: Wait 30s, then check status
sleep 35
docker inspect --format='{{.State.Health.Status}}' prod-test

Output:

Specification

healthy

bash

#

3. Non-root user: Verify process owner
docker exec prod-test whoami

Output:

Specification

appuser

All three pillars verified. Clean up:

Specification

docker stop prod-test && docker rm prod-test

Common Hardening Mistakes

Mistake 1: USER Before COPY

Placing USER appuser before COPY commands can cause permission errors:

dockerfile

# WRONG: User can't copy because it doesn't own /app yet
USER appuser
COPY main.py .  # Fails: permission denied

# CORRECT: Copy first with ownership, then switch user
COPY --chown=appuser:appuser main.py .
USER appuser

Mistake 2: Missing Health Endpoint

Adding HEALTHCHECK without implementing the endpoint:

dockerfile

# Docker checks /health, but your app doesn't have that route
HEALTHCHECK CMD wget --spider http://localhost:8000/health || exit 1

Result: Container marked unhealthy immediately.

Fix: Always implement the health endpoint in your application code.

Mistake 3: Wrong Port in HEALTHCHECK

The health check runs INSIDE the container. Use the container's internal port:

dockerfile

# WRONG: 9000 is the host port, not container port
HEALTHCHECK CMD wget --spider http://localhost:9000/health || exit 1

# CORRECT: 8000 is the port your app listens on inside the container
HEALTHCHECK CMD wget --spider http://localhost:8000/health || exit 1

Mistake 4: Secrets in ENV

Never put sensitive data in Dockerfile ENV instructions:

dockerfile

# WRONG: Secret visible in image history and inspect
ENV API_KEY=sk-abc123secret

# CORRECT: Pass at runtime, never store in image
# docker run -e API_KEY=sk-abc123secret my-app:latest

Use -e flag at runtime or Docker secrets for sensitive configuration.

Production Hardening Checklist

Before deploying any container to production, verify:

Check	Command	Expected
Non-root user	docker exec <container> whoami	Not root
Health check exists	docker inspect --format='{{.Config.Healthcheck}}' <image>	Non-empty
Health status	docker inspect --format='{{.State.Health.Status}}' <container>	healthy
No hardcoded secrets	docker history <image>	No API keys visible
Config via ENV	docker run --rm -e LOG_LEVEL=DEBUG <image> env	Variable overridable