Your observability stack is running. Prometheus collects metrics, Grafana renders dashboards, and your SLO alerts fire when error budgets burn too fast. But there is one question your current stack cannot answer: How much does this cost?
You receive the monthly cloud bill. $47,000. Your finance team asks: "Which team is responsible for this spend? Why did costs jump 35% from last month? Are we paying for resources nobody uses?" Without cost visibility, you are debugging finances the same way you once debugged production — in the dark, hoping patterns emerge from spreadsheets.
This lesson teaches you to answer those questions with the same precision you now apply to latency percentiles. You will install OpenCost to monitor Kubernetes spending in real time, apply cost allocation labels so every dollar traces to an owner, identify idle resources wasting budget, use VPA recommendations to right-size workloads, and schedule non-production environments to run only when needed.
FinOps is not about cutting costs. FinOps is about building discipline and transparency so engineering speed and financial control work together instead of against each other.
The FinOps Foundation defines three lifecycle phases:
Organizations that implement FinOps practices discover significant untapped savings. The goal is not spending less — it is spending intentionally.
FinOps practices are built on six governing principles:
These principles guide every practice in this lesson.
OpenCost is a vendor-neutral open source project for measuring and allocating cloud infrastructure costs in real time. It is a CNCF Incubating project supported by AWS, Microsoft, Google, and the broader cloud-native community.
OpenCost requires Prometheus for metrics storage. If you followed Lesson 2, you already have the kube-prometheus-stack installed:
Output:
Output:
Output:
Forward the port to access the OpenCost dashboard:
Open http://localhost:9090 in your browser. You will see real-time cost breakdowns by namespace, deployment, and pod.
Without labels, your cost data shows spending by namespace — but namespaces do not pay bills. Teams do. Products do. Business units do.
Cost allocation labels map Kubernetes resources to organizational structures.
Add these labels to every Deployment, StatefulSet, and DaemonSet:
For simpler allocation, label namespaces. All pods inherit the namespace's cost center:
OpenCost exposes cost data as Prometheus metrics. Query daily cost by namespace:
Output:
This shows daily cost in dollars for each namespace.
If pods have team labels, query cost by team:
Output:
Now you can answer: "The agents team spent $15.60 yesterday on Kubernetes resources."
Cost visibility reveals where money goes. Waste identification reveals where it should not go.
Find CPU requested but not used:
Output:
The dev namespace requests 2.8 cores more than it uses. That is wasted capacity.
Calculate efficiency to identify the worst offenders:
Output:
The dev namespace has 85% waste — only 15% of requested CPU is used.
Find pods where actual usage is less than 50% of requests:
This returns pods that are good candidates for right-sizing.
Guessing resource requests leads to either waste (over-provisioning) or instability (under-provisioning). The Vertical Pod Autoscaler removes guesswork by analyzing actual usage and recommending optimal values.
VPA consists of three parts:
Output:
Start with recommendation mode to see suggestions without automatic changes:
Apply the VPA:
Output:
After VPA collects usage data (typically 24-48 hours), check recommendations:
Output:
VPA recommends:
If your current requests are 500m CPU and 512Mi memory, VPA just identified 75% CPU over-provisioning and 50% memory over-provisioning.
Update your Deployment to match VPA recommendations:
Visibility is not useful if nobody looks at it. Dashboards and alerts make cost data actionable.
Create a dashboard showing total cost, cost by namespace, and efficiency score:
The efficiency gauge shows red below 40%, yellow 40–70%, and green above 70%.
Create alerts when spending exceeds thresholds:
Apply the rules:
Output:
Development and staging environments often run 24/7 but are used only during business hours. Running workloads for 40 hours instead of 168 hours per week reduces their cost by approximately 75%.
Create CronJobs to scale down dev namespaces outside business hours:
The CronJob needs permission to scale deployments:
Apply both files:
Output:
If dev namespace costs $30/day running 24/7:
Your observability skill now needs cost engineering patterns. Review what you have learned:
Ask your observability skill:
If the skill returns a correct aggregation query with topk(5, ...), it handles cost queries. If not, it needs the OpenCost patterns.
Review your skill for these capabilities:
Add missing patterns. For example, if VPA patterns are missing:
Your skill should now cover the full FinOps lifecycle: Inform (cost visibility), Optimize (right-sizing), and Operate (budget alerts and scheduling).
What you're learning: Cost analysis starts with understanding organizational structure. The queries that matter depend on how your organization allocates responsibility. This dialogue helps you map your organization to cost queries.
What you're learning: Not all waste should be eliminated. Production databases might intentionally over-provision for safety margins. Stateless web services are safer to right-size. This conversation teaches you to prioritize based on risk.
What you're learning: Cost governance is organizational, not just technical. Labels must reflect actual accountability structures. This exercise forces you to think about who pays for shared infrastructure, how to handle platform costs, and what thresholds trigger action.
Safety note: Cost data reveals business information — which products receive investment, which teams are growing, and organizational priorities. Treat OpenCost dashboards with the same access controls as financial reports.