Your Kubernetes cluster is running Pods. Everything works perfectly in development. Then you deploy to production.
Your Pod crashes immediately. Or it stays Pending forever. Or it consumes all memory and gets evicted. You don't know why—you just see error states and no explanation.
This lesson teaches you to read what the cluster is trying to tell you. Kubernetes provides signals about Pod failures: status fields, events, logs, and resource constraints. Learning to interpret these signals is the difference between a 5-minute fix and hours of frustration.
Before diving into debugging, you need to understand how Kubernetes allocates resources.
Think of resource management like renting an apartment:
In Kubernetes:
Key Principle: A Pod cannot be scheduled on a node unless that node has at least the REQUESTED amount of free resources. Limits prevent a Pod from monopolizing node resources.
Always use Mi and Gi (binary) not MB and GB (decimal) in Kubernetes manifests. They're different.
Kubernetes prioritizes which Pods to evict when a node runs out of resources. This priority is determined by the Pod's QoS class.
Guaranteed (Highest Priority)
When requests equal limits, the Pod is Guaranteed. Kubernetes evicts Guaranteed Pods LAST. Use this for critical workloads.
Burstable (Medium Priority)
When requests < limits, the Pod is Burstable. Kubernetes evicts Burstable Pods second. Use this for normal workloads (most agents).
BestEffort (Lowest Priority)
When a Pod has no requests or limits, it's BestEffort. Kubernetes evicts these FIRST. Only use this for non-critical batch jobs.
Output:
Output (relevant section):
Output:
Inside the Pod:
Manifest (crash-loop.yaml):
Troubleshooting Steps:
Fix: Add the environment variable to the manifest and re-apply.
Manifest (pending-pod.yaml):
Troubleshooting Steps:
Fix: Reduce memory/CPU requests to reasonable values.
Collaborate with AI to troubleshoot a complex scenario.
Step 1: Deploy a broken Pod
Step 2: Ask AI for Analysis
Prompt AI: "I've deployed a multi-container Pod. Here is the describe output: [paste output]. What QoS class is this? How do I fix the sidecar being BestEffort while the main is Burstable?"
You built a kubernetes-deployment skill in Chapter 1. Test and improve it based on what you learned.