OpenCost/Kubecost Visibility

Name: Digital FTEs: Engineering — Achieving 10× Productivity
Author: Muhammad Usman Akbar

Your Kubernetes cluster runs three namespaces: production (customer-facing APIs), staging (testing), and data-science (ML training jobs). At month-end, your cloud bill shows $15,000. Your CFO asks: "Which team is responsible for what portion of that cost?"

Without cost visibility, you're guessing. Maybe production is expensive because it runs 24/7. Maybe data-science is expensive because GPU nodes cost $3/hour. Maybe staging is wasting money running unused replicas. You could estimate based on node count, but that ignores the reality: pods share nodes, some pods are memory-heavy while others are CPU-heavy, and resource requests rarely match actual usage.

OpenCost solves this. It watches your cluster, tracks resource consumption per pod, and calculates cost allocation with precision. When your CFO asks "where's the money going?", you query the API: "Show me cost breakdown by namespace for the last 30 days." Seconds later: production ($9,200), data-science ($4,800), staging ($1,000). Now you can have a real conversation about whether that data-science spend is generating value.

This lesson teaches you to install OpenCost, understand its architecture, and query the allocation API to answer cost visibility questions.

OpenCost vs Kubecost: CNCF Open Source vs Commercial

Before diving into implementation, understand the landscape. Two tools dominate Kubernetes cost visibility, and they share the same core engine.

Aspect	OpenCost	Kubecost
License	Apache 2.0 (fully open source)	Freemium with paid tiers
CNCF Status	Incubating project	Commercial product
Pricing Accuracy	On-demand list prices	Includes discounts, credits, spot pricing
Multi-Cluster	Single cluster view	Unified multi-cluster view (paid)
Cost Allocation	Full namespace/label/pod support	Same, plus forecasting/anomaly detection
Best For	Learning, single clusters, budget-conscious	Enterprises needing discount reconciliation

For this course: We use OpenCost because it's the CNCF standard and free to use. Everything you learn applies directly to Kubecost if your organization needs enterprise features.

OpenCost Architecture

OpenCost runs as a deployment in your cluster. It integrates with Prometheus to collect resource metrics and calculate costs.

text

┌─────────────────────────────────────────────────────────┐
│                    Kubernetes Cluster                    │
│                                                          │
│  ┌──────────────┐    metrics    ┌──────────────────┐    │
│  │  Prometheus  │ ◄──────────── │   Node Exporter  │    │
│  │              │               │   kube-state-    │    │
│  │              │               │   metrics        │    │
│  └──────┬───────┘               └──────────────────┘    │
│         │                                                │
│         │ PromQL queries                                 │
│         ▼                                                │
│  ┌──────────────┐    /allocation     ┌──────────────┐   │
│  │   OpenCost   │ ◄──────────────────│  Your Query  │   │
│  │              │    /assets         │  (curl, UI)  │   │
│  │  Port 9003   │                    └──────────────┘   │
│  └──────────────┘                                        │
│                                                          │
└─────────────────────────────────────────────────────────┘

How it works:

Metrics collection: Prometheus scrapes resource metrics from your cluster (CPU, memory, network, storage usage per pod).
Cost calculation: OpenCost queries Prometheus and applies pricing from cloud billing APIs using the formula: max(request, usage) * hourly_rate.
Allocation API: You query OpenCost's HTTP API to retrieve cost data aggregated by namespace, label, controller, or pod.
Prometheus integration: OpenCost exposes its own metrics, enabling Grafana dashboards and alerting.

Installing OpenCost

OpenCost requires Prometheus. If you followed Chapter 85, you already have kube-prometheus-stack installed.

Prerequisites

Verify Prometheus is running:

bash

kubectl get pods -n monitoring | grep prometheus

Install OpenCost with Helm

bash

# Add the OpenCost Helm repository
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update

# Install OpenCost pointing to your Prometheus
helm install opencost opencost/opencost \
  --namespace opencost \
  --create-namespace \
  --set prometheus.internal.serviceName=prometheus-kube-prometheus-stack-prometheus \
  --set prometheus.internal.namespaceName=monitoring

Verify Installation

Check that OpenCost is running:

bash

kubectl get pods -n opencost

Port-forward to access the API:

bash

kubectl port-forward -n opencost svc/opencost 9003:9003

Test the API is responding:

bash

curl http://localhost:9003/allocation/compute \
  -G \
  -d window=1h \
  -d aggregate=namespace

Querying the Allocation API

The /allocation API is your primary interface for cost visibility. It answers: "How much did X cost during time window Y?"

Query Structure

bash

curl http://localhost:9003/allocation/compute \
  -G \
  -d window=<time-range> \
  -d aggregate=<grouping> \
  -d filter=<optional-filter>

Parameter	Description	Examples
window	Time range to query	1h, 24h, 7d, 30d, lastweek
aggregate	How to group costs	namespace, label:team, controller, pod
filter	Limit to specific resources	namespace:production, label:app=task-api
shareIdle	Include idle costs	true, false

Query by Namespace

bash

curl http://localhost:9003/allocation/compute \
  -G \
  -d window=7d \
  -d aggregate=namespace \
  -d shareIdle=true

Query by Team Label

If your pods have labels.team: product-team, query by that label:

bash

curl http://localhost:9003/allocation/compute \
  -G \
  -d window=7d \
  -d aggregate=label:team

Filter to Specific Resources

bash

# Cost for production namespace only
curl http://localhost:9003/allocation/compute \
  -G \
  -d window=7d \
  -d aggregate=pod \
  -d filter=namespace:production

Understanding Idle Cost

Idle cost is the money you're paying for resources that no one is using. It's the gap between what you're paying for and what workloads are actually consuming.

bash

Idle Cost = Provisioned Cost - Allocated Cost

Why Idle Cost Matters

If you provision a 32GB RAM node but pods only request 16GB, you're paying for 16GB of idle memory. At $0.01/GB/hour, that's $115 wasted over a month.

Querying Idle Cost

Use shareIdle=true to distribute idle costs across namespaces proportionally:

bash

# With idle cost distribution
curl http://localhost:9003/allocation/compute \
  -G \
  -d window=7d \
  -d aggregate=namespace \
  -d shareIdle=true

With shareIdle=true, each namespace's cost includes its proportional share of cluster-wide idle costs, giving an accurate picture of true cost.

Reducing Idle Cost

Right-size nodes: Use VPA recommendations to reduce over-provisioning.
Use autoscaling: KEDA and HPA scale down during low-traffic periods.
Bin-packing: Cluster autoscaler removes underutilized nodes.
Review requests: If pods request 1GB but use 200MB, reduce the request.

Cost Attribution Labels

OpenCost can only report costs by dimensions it knows about. Labeling is essential for visibility.

Recommended Labels

Add these labels to all workloads:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: task-api
  labels:
    team: product-team          # Which team owns this?
    app: task-api               # Which application is this?
    environment: production     # Production, staging, development?
    cost-center: engineering    # Budget allocation

Label	Purpose	Query Example
team	Team responsibility	aggregate=label:team
app	Application attribution	aggregate=label:app
environment	Environment separation	filter=label:environment=production
cost-center	Finance/billing	aggregate=label:cost-center

From Visibility to Action: The FinOps Progression

OpenCost provides visibility. What you do with that follows the FinOps progression:

Stage 1: Showback

Report costs to teams without charging them to build trust.

Create weekly reports
Share dashboards
Validate data

Stage 2: Allocation

Map costs to business entities and connect to budgets.

Assign cost-center labels
Create finance reports
Track against budget

Stage 3: Chargeback

Formally bill internal teams for their usage.

Monthly invoices to cost centers
Usage-based internal billing
Accountability for over-budget spend

Try With AI

Test your ability to formulate queries and design labeling strategies.

Prompt 1 (Cost Query Design):

text

I have a Kubernetes cluster with 5 namespaces and 3 teams. Pods have labels: team, app, environment. Design OpenCost queries that answer:
1. Monthly cost by team

2. Cost breakdown for the 'ml-team' by application

3. Idle cost percentage across the cluster
Show the curl commands I would use.

Prompt 2 (Label Strategy):

text

I'm setting up cost attribution for a new Kubernetes cluster. We have 4 teams, 3 environments, and 15 applications. Design a labeling strategy:
- What labels should every resource have?
- How do I enforce these labels?
- What queries will these labels enable?

Prompt 3 (Cost Report Generation):

text

I want to create a weekly cost report for my team leads showing:
- Total cost for the week
- Cost breakdown by application
- Comparison to previous week
- Top 3 most expensive pods
Using the OpenCost API, design a script that generates this report.

Safety Note

Cost data can reveal business information (which products get investment, team sizes, scale). Restrict access to cost dashboards and APIs. In multi-tenant clusters, ensure teams only see their own cost data.