Your Kubernetes cluster runs three namespaces: production (customer-facing APIs), staging (testing), and data-science (ML training jobs). At month-end, your cloud bill shows $15,000. Your CFO asks: "Which team is responsible for what portion of that cost?"
Without cost visibility, you're guessing. Maybe production is expensive because it runs 24/7. Maybe data-science is expensive because GPU nodes cost $3/hour. Maybe staging is wasting money running unused replicas. You could estimate based on node count, but that ignores the reality: pods share nodes, some pods are memory-heavy while others are CPU-heavy, and resource requests rarely match actual usage.
OpenCost solves this. It watches your cluster, tracks resource consumption per pod, and calculates cost allocation with precision. When your CFO asks "where's the money going?", you query the API: "Show me cost breakdown by namespace for the last 30 days." Seconds later: production ($9,200), data-science ($4,800), staging ($1,000). Now you can have a real conversation about whether that data-science spend is generating value.
This lesson teaches you to install OpenCost, understand its architecture, and query the allocation API to answer cost visibility questions.
Before diving into implementation, understand the landscape. Two tools dominate Kubernetes cost visibility, and they share the same core engine.
For this course: We use OpenCost because it's the CNCF standard and free to use. Everything you learn applies directly to Kubecost if your organization needs enterprise features.
OpenCost runs as a deployment in your cluster. It integrates with Prometheus to collect resource metrics and calculate costs.
How it works:
OpenCost requires Prometheus. If you followed Chapter 85, you already have kube-prometheus-stack installed.
Verify Prometheus is running:
Check that OpenCost is running:
Port-forward to access the API:
Test the API is responding:
The /allocation API is your primary interface for cost visibility. It answers: "How much did X cost during time window Y?"
If your pods have labels.team: product-team, query by that label:
Idle cost is the money you're paying for resources that no one is using. It's the gap between what you're paying for and what workloads are actually consuming.
If you provision a 32GB RAM node but pods only request 16GB, you're paying for 16GB of idle memory. At $0.01/GB/hour, that's $115 wasted over a month.
Use shareIdle=true to distribute idle costs across namespaces proportionally:
With shareIdle=true, each namespace's cost includes its proportional share of cluster-wide idle costs, giving an accurate picture of true cost.
OpenCost can only report costs by dimensions it knows about. Labeling is essential for visibility.
Add these labels to all workloads:
OpenCost provides visibility. What you do with that follows the FinOps progression:
Report costs to teams without charging them to build trust.
Map costs to business entities and connect to budgets.
Formally bill internal teams for their usage.
Test your ability to formulate queries and design labeling strategies.
Prompt 1 (Cost Query Design):
Prompt 2 (Label Strategy):
Prompt 3 (Cost Report Generation):
Cost data can reveal business information (which products get investment, team sizes, scale). Restrict access to cost dashboards and APIs. In multi-tenant clusters, ensure teams only see their own cost data.