USMAN’S INSIGHTS
AI ARCHITECT
  • Home
  • About
  • Thought Leadership
  • Book
Press / Contact
USMAN’S INSIGHTS
AI ARCHITECT
⌘F
HomeBook
HomeBookThe Right-Size Revolution: Vertical Pod Autoscaling
Previous Chapter
Cloud Cost Fundamentals
Next Chapter
OpenCostKubecost Visibility
AI NOTICE: This is the table of contents for the SPECIFIC CHAPTER only. It is NOT the global sidebar. For all chapters, look at the main navigation.

On this page

17 sections

Progress0%
1 / 17

Muhammad Usman Akbar Entity Profile

Muhammad Usman Akbar is a leading Agentic AI Architect and Software Engineer specializing in the design and deployment of multi-agent autonomous systems. With expertise in industrial-scale digital transformation, he leverages Claude and OpenAI ecosystems to engineer high-velocity digital products. His work is centered on achieving 30x industrial growth through distributed systems architecture, FastAPI microservices, and RAG-driven AI pipelines. Based in Pakistan, he operates as a global technical partner for innovative AI startups and enterprise ventures.

USMAN’S INSIGHTS
AI ARCHITECT

Transforming businesses into autonomous AI ecosystems. Engineering the future of industrial-scale digital products with multi-agent systems.

30X Growth
AI-First
Innovation

Navigation

  • Home
  • Book
  • About
  • Contact
Let's Collaborate

Have a Project in Mind?

Let's build something extraordinary together. Transform your vision into autonomous AI reality.

Start Your Transformation

© 2026 Muhammad Usman Akbar. All rights reserved.

Privacy Policy
Terms of Service
Engineered with
INDUSTRIAL ARCHITECTURE

Right-Sizing with VPA

Your Task API deployment requests 1 CPU and 2Gi of memory per pod. You set these values six months ago, guessing what the application might need. Now you're paying for resources the pods never use.

You check the metrics: average CPU usage is 150m (15% of requested). Average memory is 400Mi (20% of requested). You're paying for 5x the CPU and 5x the memory your application actually needs. Across 10 replicas running 24/7, that waste adds up to hundreds of dollars per month.

Manual right-sizing is tedious and risky. You could lower the requests based on current usage, but what about traffic spikes? What about that batch job that runs Sunday nights and uses 3x normal resources? Guess wrong, and your pods get OOMKilled or CPU-throttled during peak load.

The Vertical Pod Autoscaler (VPA) solves this. It continuously monitors actual resource usage, generates recommendations based on real patterns, and can automatically adjust pod requests to match actual needs. This lesson teaches you how to install VPA, configure it in safe "recommendations-only" mode, interpret its output, and calculate savings before applying changes.


Why Right-Sizing Matters

Over-provisioning is the default in Kubernetes. Developers set generous resource requests to avoid problems: "Better to request too much than get throttled." But this approach has significant costs.

MetricTask API CurrentActual UsageWaste
CPU Request1000m150m850m (85%)
Memory Request2Gi400Mi1.6Gi (80%)
Monthly Cost (10 pods)$500$100$400 wasted

VPA addresses this by recommending request values based on observed usage, accounting for peaks and patterns your manual observation would miss.


VPA Architecture

VPA consists of three components that work together:

text
┌─────────────────────┐ │ VPA CRD │ │ (Your Config) │ └──────────┬──────────┘ │ ┌─────────────────────────┼─────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────────┐ ┌────────────────────┐ │ Recommender │ │ Updater │ │ Admission Controller│ │ │ │ │ │ │ │ Queries metrics │ │ Evicts pods when │ │ Modifies pod specs │ │ Calculates recs │ │ resources outdated │ │ at creation time │ └────────┬────────┘ └──────────┬──────────┘ └─────────┬──────────┘ │ │ │ ▼ ▼ ▼ Recommendations Pod Eviction Pod Creation stored in status (Recreate mode) with new requests
  1. Recommender: Continuously queries the Metrics Server, analyzes historical usage (typically 8+ days), and computes recommended CPU and memory requests.
  2. Updater: When VPA is in Recreate mode, the Updater evicts pods whose current requests differ significantly from recommendations so they can be recreated.
  3. Admission Controller: When a pod is created, this component modifies the pod spec to use VPA-recommended resource values (if in Initial or Recreate mode).

Installing VPA

VPA is not included in standard Kubernetes. You need to install it separately via Helm.

Prerequisites:

  • Kubernetes 1.21+
  • Metrics Server installed and running
  • Helm 3+

Installation via Helm:

bash
# Add the Fairwinds repository helm repo add fairwinds-stable https://charts.fairwinds.com/stable helm repo update # Install VPA in its own namespace helm install vpa fairwinds-stable/vpa \ --namespace vpa \ --create-namespace

Verify installation:

bash
kubectl get pods -n vpa

Expected output:

text
NAME READY STATUS RESTARTS AGE vpa-admission-controller-6b9b5d8c4b-x2j9k 1/1 Running 0 30s vpa-recommender-7d6b8c5f9a-m4n8p 1/1 Running 0 30s vpa-updater-5f7c9d8b6c-q9r2s 1/1 Running 0 30s

VPA Modes: Off, Initial, and Recreate

VPA operates in different modes based on your update policy. Choose the mode that matches your risk tolerance.

ModeRecommendationsNew PodsEvicts RunningProduction Safety
OffYesNoNoSafest - observe only
InitialYesYesNoConservative - new pods only
RecreateYesYesYesActive - existing pods evicted

When to Use Which?

  • Off Mode: First deploying VPA to any workload; production services where restarts are disruptive.
  • Initial Mode: After validating Off mode; workloads with frequent deployments; gradual rollout.
  • Recreate Mode: After validating recommendations; workloads that tolerate restarts; non-production testing.

Creating a VPA for Task API

Start with Off mode to observe recommendations without risk.

VPA Manifest (task-api-vpa.yaml):

yaml
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: task-api-vpa namespace: production labels: app: task-api team: product-team spec: targetRef: apiVersion: apps/v1 kind: Deployment name: task-api updatePolicy: updateMode: "Off" # Recommendations only - safest option resourcePolicy: containerPolicies: - containerName: task-api minAllowed: cpu: 100m # Never recommend below this memory: 128Mi maxAllowed: cpu: 2000m # Never recommend above this memory: 2Gi controlledResources: ["cpu", "memory"]

Apply the VPA:

bash
kubectl apply -f task-api-vpa.yaml

Reading VPA Recommendations

After VPA collects enough metrics (minimum 15-30 minutes, ideally 8+ days), check the status:

bash
kubectl describe vpa task-api-vpa -n production

Status Snippet:

text
Recommendation: containerRecommendations: - containerName: task-api lowerBound: cpu: 100m memory: 200Mi target: cpu: 180m memory: 450Mi upperBound: cpu: 500m memory: 900Mi

Understanding the Recommendation Fields

FieldMeaningUse Case
targetOptimal resource request based on patternsSet your requests to this value
lowerBoundMinimum safe value (P10 usage)Never go below this
upperBoundMaximum expected value (P95 usage)Your limit should be at or above this

Calculating Savings

Compare current requests to VPA targets (example rates):

ResourceCurrent RequestVPA TargetReductionSavings
CPU1000m180m820m82%
Memory2Gi450Mi1.6Gi78%

Financial Impact Example:

text
Current cost (per pod/month): $40 Recommended cost (per pod/month): $7.65 Savings per pod: $32.35 (81%) With 10 replicas: $323.50 savings per month / $3,882 annually.

VPA and HPA: The Coexistence Rules

The Conflict: If both react to the same metric (e.g., CPU), they fight. HPA adds pods while VPA increases pod size, potentially overwhelming the cluster nodes.

Safe Coexistence Patterns

PatternVPA ModeHPA MetricSafe?
Manual RecommendationOffCPUYes - VPA only recommends
Static ScaleRecreateCPUNo - Conflicting signals
Custom ScalingRecreaterequests/secYes - Different triggers
Resource SplitRecreate (Memory)CPUYes - Different resources

Applying VPA Recommendations Safely

  1. Option 1 (Manual): Update your Deployment manifest from VPA recommendations and deploy via CI/CD. This is GitOps-friendly.
  2. Option 2 (Initial Mode): Change VPA to Initial. New pods created during normal deployments get optimized resources.
  3. Option 3 (Recreate Mode): Full automation. Use only after validating recommendations and ensuring PodDisruptionBudgets are in place.

Try With AI

Test your ability to design and troubleshoot vertical pod scaling.

Prompt 1 (VPA Configuration Design):

text
I have a FastAPI application with 500m CPU/1Gi requests. I'm using HPA based on CPU. Design a safe VPA configuration that: - Starts in recommendation-only mode - Sets appropriate min/max bounds - Avoids conflicts with my existing HPA Explain your choices.

Prompt 2 (Interpreting VPA Output):

text
VPA output: lowerBound: cpu=50m, memory=100Mi; target: cpu=120m, memory=256Mi; upperBound: cpu=400m, memory=512Mi. Current requests: cpu=1000m, memory=2Gi. 1. What do these bounds mean? 2. How much can I save applying the target? 3. What should I set for limits?

Prompt 3 (Mode Selection):

text
Recommend a VPA mode for these 3 scenarios: 1. Production API (user-facing, no restarts) 2. Batch processing (nightly, tolerates restarts) 3. Development (frequent daily deployments) Include observation time and safety measures.

Safety Note

VPA recommendations are based on historical usage. If your workload pattern changes (new features, increased traffic), previous recommendations may become invalid. Monitor continuously after applying changes.