Your Task API deployment requests 1 CPU and 2Gi of memory per pod. You set these values six months ago, guessing what the application might need. Now you're paying for resources the pods never use.
You check the metrics: average CPU usage is 150m (15% of requested). Average memory is 400Mi (20% of requested). You're paying for 5x the CPU and 5x the memory your application actually needs. Across 10 replicas running 24/7, that waste adds up to hundreds of dollars per month.
Manual right-sizing is tedious and risky. You could lower the requests based on current usage, but what about traffic spikes? What about that batch job that runs Sunday nights and uses 3x normal resources? Guess wrong, and your pods get OOMKilled or CPU-throttled during peak load.
The Vertical Pod Autoscaler (VPA) solves this. It continuously monitors actual resource usage, generates recommendations based on real patterns, and can automatically adjust pod requests to match actual needs. This lesson teaches you how to install VPA, configure it in safe "recommendations-only" mode, interpret its output, and calculate savings before applying changes.
Over-provisioning is the default in Kubernetes. Developers set generous resource requests to avoid problems: "Better to request too much than get throttled." But this approach has significant costs.
VPA addresses this by recommending request values based on observed usage, accounting for peaks and patterns your manual observation would miss.
VPA consists of three components that work together:
VPA is not included in standard Kubernetes. You need to install it separately via Helm.
Prerequisites:
Installation via Helm:
Verify installation:
Expected output:
VPA operates in different modes based on your update policy. Choose the mode that matches your risk tolerance.
Start with Off mode to observe recommendations without risk.
VPA Manifest (task-api-vpa.yaml):
Apply the VPA:
After VPA collects enough metrics (minimum 15-30 minutes, ideally 8+ days), check the status:
Status Snippet:
Compare current requests to VPA targets (example rates):
Financial Impact Example:
The Conflict: If both react to the same metric (e.g., CPU), they fight. HPA adds pods while VPA increases pod size, potentially overwhelming the cluster nodes.
Test your ability to design and troubleshoot vertical pod scaling.
Prompt 1 (VPA Configuration Design):
Prompt 2 (Interpreting VPA Output):
Prompt 3 (Mode Selection):
VPA recommendations are based on historical usage. If your workload pattern changes (new features, increased traffic), previous recommendations may become invalid. Monitor continuously after applying changes.