Right-Sizing with VPA

Name: Digital FTEs: Engineering — Achieving 10× Productivity
Author: Muhammad Usman Akbar

Your Task API deployment requests 1 CPU and 2Gi of memory per pod. You set these values six months ago, guessing what the application might need. Now you're paying for resources the pods never use.

You check the metrics: average CPU usage is 150m (15% of requested). Average memory is 400Mi (20% of requested). You're paying for 5x the CPU and 5x the memory your application actually needs. Across 10 replicas running 24/7, that waste adds up to hundreds of dollars per month.

Manual right-sizing is tedious and risky. You could lower the requests based on current usage, but what about traffic spikes? What about that batch job that runs Sunday nights and uses 3x normal resources? Guess wrong, and your pods get OOMKilled or CPU-throttled during peak load.

The Vertical Pod Autoscaler (VPA) solves this. It continuously monitors actual resource usage, generates recommendations based on real patterns, and can automatically adjust pod requests to match actual needs. This lesson teaches you how to install VPA, configure it in safe "recommendations-only" mode, interpret its output, and calculate savings before applying changes.

Why Right-Sizing Matters

Over-provisioning is the default in Kubernetes. Developers set generous resource requests to avoid problems: "Better to request too much than get throttled." But this approach has significant costs.

Metric	Task API Current	Actual Usage	Waste
CPU Request	1000m	150m	850m (85%)
Memory Request	2Gi	400Mi	1.6Gi (80%)
Monthly Cost (10 pods)	$500	$100	$400 wasted

VPA addresses this by recommending request values based on observed usage, accounting for peaks and patterns your manual observation would miss.

VPA Architecture

VPA consists of three components that work together:

text

┌─────────────────────┐
                        │      VPA CRD        │
                        │  (Your Config)      │
                        └──────────┬──────────┘
                                   │
         ┌─────────────────────────┼─────────────────────────┐
         │                         │                         │
         ▼                         ▼                         ▼
┌─────────────────┐    ┌─────────────────────┐    ┌────────────────────┐
│   Recommender   │    │       Updater       │    │ Admission Controller│
│                 │    │                     │    │                    │
│ Queries metrics │    │ Evicts pods when    │    │ Modifies pod specs │
│ Calculates recs │    │ resources outdated  │    │ at creation time   │
└────────┬────────┘    └──────────┬──────────┘    └─────────┬──────────┘
         │                        │                         │
         ▼                        ▼                         ▼
    Recommendations           Pod Eviction              Pod Creation
    stored in status          (Recreate mode)           with new requests

Recommender: Continuously queries the Metrics Server, analyzes historical usage (typically 8+ days), and computes recommended CPU and memory requests.
Updater: When VPA is in Recreate mode, the Updater evicts pods whose current requests differ significantly from recommendations so they can be recreated.
Admission Controller: When a pod is created, this component modifies the pod spec to use VPA-recommended resource values (if in Initial or Recreate mode).

Installing VPA

VPA is not included in standard Kubernetes. You need to install it separately via Helm.

Prerequisites:

Kubernetes 1.21+
Metrics Server installed and running
Helm 3+

Installation via Helm:

bash

# Add the Fairwinds repository
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update

# Install VPA in its own namespace
helm install vpa fairwinds-stable/vpa \
  --namespace vpa \
  --create-namespace

Verify installation:

bash

kubectl get pods -n vpa

Expected output:

text

NAME                                        READY   STATUS    RESTARTS   AGE
vpa-admission-controller-6b9b5d8c4b-x2j9k   1/1     Running   0          30s
vpa-recommender-7d6b8c5f9a-m4n8p            1/1     Running   0          30s
vpa-updater-5f7c9d8b6c-q9r2s                1/1     Running   0          30s

VPA Modes: Off, Initial, and Recreate

VPA operates in different modes based on your update policy. Choose the mode that matches your risk tolerance.

Mode	Recommendations	New Pods	Evicts Running	Production Safety
Off	Yes	No	No	Safest - observe only
Initial	Yes	Yes	No	Conservative - new pods only
Recreate	Yes	Yes	Yes	Active - existing pods evicted

When to Use Which?

Off Mode: First deploying VPA to any workload; production services where restarts are disruptive.
Initial Mode: After validating Off mode; workloads with frequent deployments; gradual rollout.
Recreate Mode: After validating recommendations; workloads that tolerate restarts; non-production testing.

Creating a VPA for Task API

Start with Off mode to observe recommendations without risk.

VPA Manifest (task-api-vpa.yaml):

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: task-api-vpa
  namespace: production
  labels:
    app: task-api
    team: product-team
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: task-api
  updatePolicy:
    updateMode: "Off"  # Recommendations only - safest option
  resourcePolicy:
    containerPolicies:
    - containerName: task-api
      minAllowed:
        cpu: 100m       # Never recommend below this
        memory: 128Mi
      maxAllowed:
        cpu: 2000m      # Never recommend above this
        memory: 2Gi
      controlledResources: ["cpu", "memory"]

Apply the VPA:

bash

kubectl apply -f task-api-vpa.yaml

Reading VPA Recommendations

After VPA collects enough metrics (minimum 15-30 minutes, ideally 8+ days), check the status:

bash

kubectl describe vpa task-api-vpa -n production

Status Snippet:

text

Recommendation:
  containerRecommendations:
  - containerName: task-api
    lowerBound:
      cpu: 100m
      memory: 200Mi
    target:
      cpu: 180m
      memory: 450Mi
    upperBound:
      cpu: 500m
      memory: 900Mi

Understanding the Recommendation Fields

Field	Meaning	Use Case
target	Optimal resource request based on patterns	Set your requests to this value
lowerBound	Minimum safe value (P10 usage)	Never go below this
upperBound	Maximum expected value (P95 usage)	Your limit should be at or above this

Calculating Savings

Compare current requests to VPA targets (example rates):

Resource	Current Request	VPA Target	Reduction	Savings
CPU	1000m	180m	820m	82%
Memory	2Gi	450Mi	1.6Gi	78%

Financial Impact Example:

text

Current cost (per pod/month): $40
Recommended cost (per pod/month): $7.65
Savings per pod: $32.35 (81%)
With 10 replicas: $323.50 savings per month / $3,882 annually.

VPA and HPA: The Coexistence Rules

The Conflict: If both react to the same metric (e.g., CPU), they fight. HPA adds pods while VPA increases pod size, potentially overwhelming the cluster nodes.

Safe Coexistence Patterns

Pattern	VPA Mode	HPA Metric	Safe?
Manual Recommendation	Off	CPU	Yes - VPA only recommends
Static Scale	Recreate	CPU	No - Conflicting signals
Custom Scaling	Recreate	requests/sec	Yes - Different triggers
Resource Split	Recreate (Memory)	CPU	Yes - Different resources

Applying VPA Recommendations Safely

Option 1 (Manual): Update your Deployment manifest from VPA recommendations and deploy via CI/CD. This is GitOps-friendly.
Option 2 (Initial Mode): Change VPA to Initial. New pods created during normal deployments get optimized resources.
Option 3 (Recreate Mode): Full automation. Use only after validating recommendations and ensuring PodDisruptionBudgets are in place.

Try With AI

Test your ability to design and troubleshoot vertical pod scaling.

Prompt 1 (VPA Configuration Design):

text

I have a FastAPI application with 500m CPU/1Gi requests. I'm using HPA based on CPU. Design a safe VPA configuration that:
- Starts in recommendation-only mode
- Sets appropriate min/max bounds
- Avoids conflicts with my existing HPA
Explain your choices.

Prompt 2 (Interpreting VPA Output):

text

VPA output: lowerBound: cpu=50m, memory=100Mi; target: cpu=120m, memory=256Mi; upperBound: cpu=400m, memory=512Mi.
Current requests: cpu=1000m, memory=2Gi.

1. What do these bounds mean?

2. How much can I save applying the target?

3. What should I set for limits?

Prompt 3 (Mode Selection):

text

Recommend a VPA mode for these 3 scenarios:
1. Production API (user-facing, no restarts)

2. Batch processing (nightly, tolerates restarts)

3. Development (frequent daily deployments)
Include observation time and safety measures.

Safety Note

VPA recommendations are based on historical usage. If your workload pattern changes (new features, increased traffic), previous recommendations may become invalid. Monitor continuously after applying changes.