USMAN’S INSIGHTS
AI ARCHITECT
  • Home
  • About
  • Thought Leadership
  • Book
Press / Contact
USMAN’S INSIGHTS
AI ARCHITECT
⌘F
HomeBook
HomeBookInside the Black Box: Solving Silent AI Container Crashes
Previous Chapter
Writing Your First Dockerfile
Next Chapter
Multi-Stage Builds Optimization
AI NOTICE: This is the table of contents for the SPECIFIC CHAPTER only. It is NOT the global sidebar. For all chapters, look at the main navigation.

On this page

38 sections

Progress0%
1 / 38

Muhammad Usman Akbar Entity Profile

Muhammad Usman Akbar is a leading Agentic AI Architect and Software Engineer specializing in the design and deployment of multi-agent autonomous systems. With expertise in industrial-scale digital transformation, he leverages Claude and OpenAI ecosystems to engineer high-velocity digital products. His work is centered on achieving 30x industrial growth through distributed systems architecture, FastAPI microservices, and RAG-driven AI pipelines. Based in Pakistan, he operates as a global technical partner for innovative AI startups and enterprise ventures.

USMAN’S INSIGHTS
AI ARCHITECT

Transforming businesses into autonomous AI ecosystems. Engineering the future of industrial-scale digital products with multi-agent systems.

30X Growth
AI-First
Innovation

Navigation

  • Home
  • Book
  • About
  • Contact
Let's Collaborate

Have a Project in Mind?

Let's build something extraordinary together. Transform your vision into autonomous AI reality.

Start Your Transformation

© 2026 Muhammad Usman Akbar. All rights reserved.

Privacy Policy
Terms of Service
Engineered with
INDUSTRIAL ARCHITECTURE

Container Lifecycle and Debugging

Your FastAPI service from Chapter 4 is running in a container. It responds to requests, returns JSON, and everything works. Then you deploy it to a server. It crashes. No error on your screen, no stack trace, nothing. The container simply stops.

This is where container debugging skills become essential. Unlike local development where errors appear in your terminal, containerized applications fail silently unless you know where to look. The container's logs, its internal state, its configuration, and its resource usage are all hidden behind Docker's abstraction layer.

In this lesson, you'll learn the debugging toolkit that every container developer needs: reading logs to understand what happened, executing commands inside containers to inspect their state, and using inspection tools to verify configuration. You'll practice these skills using the FastAPI application you built in Chapter 4, intentionally breaking it to develop debugging intuition.


Running Your FastAPI App in Detached Mode

Before debugging, let's run your Chapter 4 FastAPI container in the background. Navigate to your task-api directory from Chapter 4:

bash
cd task-api docker build -t task-api:v1 .

Output:

text
$ docker build -t task-api:v1 . [+] Building 2.1s (8/8) FINISHED => [1/5] FROM docker.io/library/python:3.12-slim => CACHED [2/5] WORKDIR /app => CACHED [3/5] COPY requirements.txt . => CACHED [4/5] RUN pip install --no-cache-dir -r requirements.txt => CACHED [5/5] COPY main.py . => exporting to image Successfully tagged task-api:v1

Now run it in detached mode (-d) so it runs in the background:

Specification
docker run -d -p 8000:8000 --name task-api task-api:v1

Output:

text
$ docker run -d -p 8000:8000 --name task-api task-api:v1 a7b8c9d0e1f2g3h4i5j6k7l8m9n0o1p2q3r4s5t6

The long string is the container ID. The -d flag means "detached"—the container runs in the background and you get your terminal back.

Verify the container is running:

Specification
docker ps

Output:

text
$ docker ps CONTAINER ID IMAGE COMMAND STATUS PORTS NAMES a7b8c9d0e1f2 task-api:v1 "uvicorn main:app ..." Up 5 seconds 0.0.0.0:8000->8000/tcp task-api

Test that it's responding:

Specification
curl http://localhost:8000/health

Output:

json
$ curl http://localhost:8000/health {"status":"healthy"}

Now you have a running container to debug. Let's explore the debugging tools.


Reading Container Logs

The most important debugging command is docker logs. It shows everything your application writes to stdout and stderr—print statements, uvicorn startup messages, errors, and stack traces.

View logs from your running container:

Specification
docker logs task-api

Output:

text
$ docker logs task-api INFO: Started server process [1] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

These logs show uvicorn started successfully. Now make a request and check logs again:

Specification
curl http://localhost:8000/docker logs task-api

Output:

text
$ curl http://localhost:8000/ {"message":"Hello from Docker!"} $ docker logs task-api INFO: Started server process [1] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) INFO: 172.17.0.1:54321 - "GET / HTTP/1.1" 200 OK

The new log line shows the request: the client IP, the endpoint accessed, and the HTTP response code (200 OK).

Following Logs in Real-Time

For live debugging, use the -f (follow) flag to stream logs continuously:

Specification
docker logs -f task-api

Output:

text
$ docker logs -f task-api INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) INFO: 172.17.0.1:54321 - "GET / HTTP/1.1" 200 OK [cursor waiting for new logs...]

Now in another terminal, make requests and watch them appear in real-time. Press Ctrl+C to stop following.

Viewing Recent Logs

For large log files, use --tail to see only the last N lines:

Specification
docker logs --tail 5 task-api

Output:

text
$ docker logs --tail 5 task-api INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) INFO: 172.17.0.1:54321 - "GET / HTTP/1.1" 200 OK INFO: 172.17.0.1:54322 - "GET /health HTTP/1.1" 200 OK

Debugging a Failed Container

Now let's create a container that fails on startup. Stop and remove the current container:

bash
docker stop task-api docker rm task-api

Output:

text
$ docker stop task-api task-api $ docker rm task-api task-api

Create a Python script that simulates a startup failure. Create broken_main.py:

python
import os import sys print("Task API starting...") print("Checking for required configuration...") # Simulate missing required environment variable api_key = os.environ.get("API_KEY") if not api_key: print("ERROR: API_KEY environment variable is required but not set") sys.exit(1) print(f"API_KEY configured: {api_key[:4]}****") print("Starting server...")

Create a Dockerfile for this broken app. Create Dockerfile.broken:

dockerfile
FROM python:3.12-slim WORKDIR /app COPY broken_main.py . CMD ["python", "broken_main.py"]

Build and run it:

bash
docker build -f Dockerfile.broken -t task-api-broken:v1 . docker run -d --name broken-api task-api-broken:v1

Output:

text
$ docker build -f Dockerfile.broken -t task-api-broken:v1 . [+] Building 1.2s (7/7) FINISHED Successfully tagged task-api-broken:v1 $ docker run -d --name broken-api task-api-broken:v1 b1c2d3e4f5g6h7i8j9k0l1m2n3o4p5q6r7s8t9u0

Check if it's running:

Specification
docker ps

Output:

text
$ docker ps CONTAINER ID IMAGE COMMAND STATUS PORTS NAMES

Empty! The container isn't running. Check all containers, including stopped ones:

Specification
docker ps -a

Output:

text
$ docker ps -a CONTAINER ID IMAGE COMMAND STATUS NAMES b1c2d3e4f5g6 task-api-broken:v1 "python broken_main..." Exited (1) 5 seconds ago broken-api

Status shows "Exited (1)" - the container crashed with exit code 1. Now use docker logs to find out why:

Specification
docker logs broken-api

Output:

text
$ docker logs broken-api Task API starting... Checking for required configuration... ERROR: API_KEY environment variable is required but not set

The logs tell you exactly what went wrong: the API_KEY environment variable is missing.

Fixing the Broken Container

Now that you know the problem, run it correctly with the environment variable:

bash
docker rm broken-api docker run -d --name broken-api -e API_KEY=sk-test-12345 task-api-broken:v1 docker logs broken-api

Output:

text
$ docker rm broken-api broken-api $ docker run -d --name broken-api -e API_KEY=sk-test-12345 task-api-broken:v1 c2d3e4f5g6h7i8j9k0l1m2n3o4p5q6r7s8t9u0v1 $ docker logs broken-api Task API starting... Checking for required configuration... API_KEY configured: sk-t**** Starting server...

The container now starts successfully because the required environment variable is set.


Executing Commands Inside Containers

Logs show what happened, but sometimes you need to inspect the container's current state. docker exec lets you run commands inside a running container.

First, restart your working FastAPI container:

bash
docker rm -f broken-api docker run -d -p 8000:8000 --name task-api task-api:v1

Output:

text
$ docker rm -f broken-api broken-api $ docker run -d -p 8000:8000 --name task-api task-api:v1 d3e4f5g6h7i8j9k0l1m2n3o4p5q6r7s8t9u0v1w2

Now execute commands inside the running container:

Specification
docker exec task-api pwd

Output:

text
$ docker exec task-api pwd /app

The container's working directory is /app, as set by WORKDIR in the Dockerfile.

List files in the container:

Specification
docker exec task-api ls -la

Output:

text
$ docker exec task-api ls -la total 16 drwxr-xr-x 1 root root 4096 Dec 22 10:30 . drwxr-xr-x 1 root root 4096 Dec 22 10:30 .. -rw-r--r-- 1 root root 237 Dec 22 10:30 main.py -rw-r--r-- 1 root root 42 Dec 22 10:30 requirements.txt

Check what user is running the process:

Specification
docker exec task-api whoami

Output:

text
$ docker exec task-api whoami root

Interactive Shell Access

For deeper debugging, launch an interactive shell inside the container:

Specification
docker exec -it task-api sh

Output:

Specification
$ docker exec -it task-api sh#

The -it flags mean "interactive" and "allocate a TTY" (terminal). You're now inside the container. Try some commands:

bash
# pwd /app # cat main.py from fastapi import FastAPI app = FastAPI() @app.get("/") def read_root(): return {"message": "Hello from Docker!"} @app.get("/health") def health_check(): return {"status": "healthy"} # exit

Output:

bash
# pwd /app # cat main.py from fastapi import FastAPI ... # exit $

The exit command returns you to your host machine.

Testing API Endpoints from Inside the Container

You can even test your API from inside the container:

Specification
docker exec task-api curl -s http://localhost:8000/health

Output:

json
$ docker exec task-api curl -s http://localhost:8000/health {"status":"healthy"}

This confirms the API is responding on port 8000 inside the container. If this works but external requests fail, you have a port mapping problem, not an application problem.


Inspecting Container Configuration

The docker inspect command shows complete configuration and runtime state in JSON format. This is essential for verifying that a container was started with the correct settings.

Inspect your running container:

Specification
docker inspect task-api

This outputs hundreds of lines. Let's extract specific information.

Check Container Status

Specification
docker inspect --format='{{.State.Status}}' task-api

Output:

text
$ docker inspect --format='{{.State.Status}}' task-api running

Check the Running Command

Specification
docker inspect --format='{{json .Config.Cmd}}' task-api

Output:

json
$ docker inspect --format='{{json .Config.Cmd}}' task-api ["uvicorn","main:app","--host","0.0.0.0","--port","8000"]

This confirms exactly what command the container is running.

Check Environment Variables

Specification
docker inspect --format='{{json .Config.Env}}' task-api

Output:

json
$ docker inspect --format='{{json .Config.Env}}' task-api ["PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin","LANG=C.UTF-8","GPG_KEY=...","PYTHON_VERSION=3.12.7","PYTHON_PIP_VERSION=24.0"]

Check Port Mappings

Specification
docker inspect --format='{{json .Network Settings.Ports}}' task-api

Output:

json
$ docker inspect --format='{{json .Network Settings.Ports}}' task-api {"8000/tcp":[{"Host Ip":"0.0.0.0","Host Port":"8000"}]}

This shows port 8000 in the container is mapped to port 8000 on the host.

Practical Use: Verify Exit Codes

For crashed containers, inspect shows why they stopped:

Specification
docker inspect --format='{{.State.Exit Code}}' broken-api

Output:

text
$ docker inspect --format='{{.State.Exit Code}}' broken-api 1

Exit code 1 means the application exited with an error. Exit code 0 means success. Exit code 137 means the kernel killed the process (usually out of memory).


Resolving Port Conflicts

A common debugging scenario: you try to start a container and it fails because the port is already in use.

Try to start another container on port 8000 while task-api is running:

Specification
docker run -d -p 8000:8000 --name task-api-2 task-api:v1

Output:

text
$ docker run -d -p 8000:8000 --name task-api-2 task-api:v1 docker: Error response from daemon: driver failed programming external connectivity on endpoint task-api-2: Bind for 0.0.0.0:8000 failed: port is already allocated.

The error is clear: port 8000 is already allocated (by your first container).

Solution 1: Use a Different Host Port

Specification
docker run -d -p 8001:8000 --name task-api-2 task-api:v1

Output:

Specification
$ docker run -d -p 8001:8000 --name task-api-2 task-api:v1e4f5g6h7i8j9k0l1m2n3o4p5q6r7s8t9u0v1w2x3

Now you have two containers:

  • task-api on port 8000
  • task-api-2 on port 8001

Verify both work:

Specification
curl http://localhost:8000/healthcurl http://localhost:8001/health

Output:

json
$ curl http://localhost:8000/health {"status":"healthy"} $ curl http://localhost:8001/health {"status":"healthy"}

Solution 2: Find and Stop the Conflicting Container

If you need port 8000 specifically, find what's using it:

Specification
docker ps --format "table {{.Names}}\t{{.Ports}}"

Output:

text
$ docker ps --format "table {{.Names}}\t{{.Ports}}" NAMES PORTS task-api-2 0.0.0.0:8001->8000/tcp task-api 0.0.0.0:8000->8000/tcp

Stop the container using your desired port:

Specification
docker stop task-apidocker rm task-api

Now you can start a new container on port 8000.

Clean up for the next section:

Specification
docker stop task-api-2docker rm task-api-2

Restart Policies for Resilience

Containers can crash due to bugs, resource exhaustion, or temporary failures. Instead of manually restarting them, configure Docker to restart them automatically.

The --restart Flag

Docker supports several restart policies:

PolicyBehavior
noNever restart (default)
alwaysAlways restart, even after successful exit
unless-stoppedRestart unless manually stopped
on-failure:NRestart only on non-zero exit, up to N times

Testing Restart Policies

Create a container that crashes sometimes. Create flaky_main.py:

python
import random import sys import time print("Task API starting...") time.sleep(1) if random.random() < 0.3: print("ERROR: Random failure occurred!") sys.exit(1) print("Task API started successfully!") while True: time.sleep(10)

Build it:

Specification
docker build -f Dockerfile.broken -t flaky-api:v1 .

Wait, that Dockerfile won't work for this script. Create Dockerfile.flaky:

Specification
FROM python:3.12-slimWORKDIR /appCOPY flaky_main.py .CMD ["python", "-u", "flaky_main.py"]

The -u flag means unbuffered output so logs appear immediately.

Specification
docker build -f Dockerfile.flaky -t flaky-api:v1 .

Output:

Specification
$ docker build -f Dockerfile.flaky -t flaky-api:v1 .[+] Building 0.8s (7/7) FINISHEDSuccessfully tagged flaky-api:v1

Run without restart policy:

Specification
docker run -d --name flaky-no-restart flaky-api:v1sleep 3docker ps -a --filter name=flaky

Output:

text
$ docker ps -a --filter name=flaky CONTAINER ID IMAGE STATUS NAMES f5g6h7i8j9k0 flaky-api:v1 Exited (1) 2 seconds ago flaky-no-restart

If the container crashed (30% chance), it stays dead. Remove it:

Specification
docker rm flaky-no-restart

Now run with automatic restart:

Specification
docker run -d --restart=unless-stopped --name flaky-restart flaky-api:v1

Watch it recover from failures:

Specification
docker logs -f flaky-restart

Output (if it fails and restarts):

text
$ docker logs -f flaky-restart Task API starting... ERROR: Random failure occurred! Task API starting... Task API started successfully!

The container automatically restarted after the failure. Check restart count:

Specification
docker inspect --format='{{.Restart Count}}' flaky-restart

Output:

text
$ docker inspect --format='{{.Restart Count}}' flaky-restart 1

Production Recommendation

For production services, use --restart=unless-stopped. This ensures:

  • Containers restart after crashes
  • Containers restart after host reboots
  • You can still manually stop them with docker stop

Clean up:

Specification
docker stop flaky-restartdocker rm flaky-restart

Try With AI

Now that you understand container debugging fundamentals, practice these skills with increasingly complex scenarios.

Prompt 1: Diagnose a Startup Failure

Create a FastAPI application that requires a DATABASE_URL environment variable. Run it without the variable and use the debugging tools you learned to identify the problem:

Create a FastAPI app that:

  1. Checks for DATABASE_URL environment variable on startup
  2. Prints an error and exits with code 1 if missing
  3. Prints "Connected to: [masked URL]" if present

Help me create the Dockerfile and show me how to:

  • See the error when DATABASE_URL is missing
  • Verify the container exit code
  • Run it successfully with the environment variable

What you're learning: This reinforces the pattern of using docker logs and docker inspect --format='{{.State.ExitCode}}' to diagnose startup failures, and -e to provide environment variables.

Prompt 2: Debug a Port Mapping Issue

Sometimes your application seems to start but doesn't respond to requests. Practice debugging this:

My FastAPI container starts successfully (docker logs shows "Uvicorn running") but curl http://localhost:8000/ returns "Connection refused".

Help me debug this using:

  1. docker inspect to check port mappings
  2. docker exec to test the app from inside the container
  3. Common causes of this problem

What you're learning: This teaches you to systematically isolate whether the problem is the application, the port mapping, or network configuration using docker exec to test from inside the container.

Prompt 3: Configure a Resilient Service

Production services need to handle crashes gracefully:

I have a FastAPI service that occasionally crashes due to memory pressure. Help me configure it with:

  1. --restart=unless-stopped for automatic recovery
  2. Memory limits to prevent runaway usage
  3. How to monitor restart count

Show me the docker run command and how to verify the configuration.

What you're learning: This reinforces restart policies and introduces memory limits (--memory), which become critical when deploying AI services that can consume large amounts of RAM.

[!NOTE] Safety note: When debugging containers in production, use read-only commands (docker logs, docker inspect) before interactive commands (docker exec). Avoid running shells in production containers unless absolutely necessary, as it can affect running services.


Reflect on Your Skill

You built a docker-deployment skill in Chapter 1. Test and improve it based on what you learned.

Test Your Skill

Using my docker-deployment skill, diagnose a failed container startup. Does my skill include debugging commands like docker logs, docker exec, and docker inspect?

Identify Gaps

Ask yourself:

  • Did my skill include container lifecycle debugging techniques?
  • Did it handle restart policies and container forensics?

Improve Your Skill

If you found gaps:

My docker-deployment skill is missing debugging and troubleshooting capabilities. Update it to include docker logs, docker exec, docker inspect usage, and restart policy configuration.