Your Task API is running in production. Users are creating tasks, completing them, and occasionally hitting errors. But when the CEO asks "How's the system performing?" you have no answer. Your logs show individual requests, but you cannot answer basic questions: How many requests per second? What's the average response time? What percentage of requests fail?
This is the gap metrics fill. Where logs tell you what happened to individual requests, metrics tell you how your system behaves over time. Prometheus has become the standard for Kubernetes metrics because it was designed for exactly this environment: dynamic, containerized, ephemeral workloads.
In this lesson, you will deploy Prometheus to your cluster, learn to query it using PromQL, and instrument your Task API to expose custom metrics. By the end, you will have answers to those performance questions.
Before deploying anything, you need to understand how Prometheus works. Unlike traditional monitoring where applications push data to a central server, Prometheus pulls metrics from applications at regular intervals.
Key components:
The pull model means your applications do not need to know where Prometheus lives. They expose metrics; Prometheus discovers and scrapes them through ServiceMonitors.
The kube-prometheus-stack Helm chart bundles Prometheus, Grafana, Alertmanager, and pre-configured dashboards for Kubernetes monitoring. This is the standard way to deploy Prometheus in production Kubernetes environments.
Add the Helm repository:
Output:
Create the monitoring namespace and install:
Output:
The critical flags:
Without these flags, your custom ServiceMonitors would be ignored.
Verify the installation:
Output:
Access Prometheus UI via port-forward:
Open http://localhost:9090 in your browser. You now have a working Prometheus instance scraping Kubernetes cluster metrics.
PromQL is how you extract meaning from metric data. Every query starts with a metric name and optionally filters by labels.
Raw counters always increase. To see the request rate, use rate() over a time window:
Output (example):
Histograms store request durations in buckets. Use histogram_quantile() to calculate percentiles:
The le label ("less than or equal") is the histogram bucket boundary. The by (le) preserves these boundaries for percentile calculation.
Google's SRE book defines the 4 golden signals every service should monitor. Here's how to query each:
Prometheus can only scrape metrics your application exposes. The prometheus_client library makes this straightforward in Python.
Install the library:
Add metrics to your FastAPI application:
Test the metrics endpoint:
Output:
Key points:
With your Task API exposing metrics, Prometheus needs to know to scrape it. ServiceMonitors are Kubernetes CRDs that configure this discovery.
First, ensure your Task API has a Service:
Create the ServiceMonitor:
Apply the ServiceMonitor:
Output:
Verify Prometheus discovered the target:
Navigate to http://localhost:9090/targets. You should see serviceMonitor/monitoring/task-api with status UP.
Complex PromQL queries can be slow to execute repeatedly. Recording rules pre-compute results and store them as new time series.
Apply the recording rules:
Output:
Now instead of computing complex aggregations on every dashboard refresh, you query the pre-computed task_api:requests:rate5m series directly. This becomes critical at scale when dashboards are loaded frequently.
Now that you understand Prometheus fundamentals, test your observability skill:
Ask your skill to generate PromQL for the 4 golden signals for your specific application:
Verify your skill produces queries similar to what you learned in this lesson. If the queries use different functions or patterns, compare them—your skill may suggest optimizations you haven't learned yet, or it may need correction based on your specific label names.
Ask AI to help you build a complex PromQL query:
What you're learning: PromQL query composition. AI can suggest functions like increase() vs rate() and explain when each is appropriate. You will likely need to refine the query based on your specific time windows and SLO targets.
Share your instrumentation plan with AI:
What you're learning: Metric design principles. AI will likely flag the user_id label as high cardinality (creating too many time series). You provide domain context about what matters; AI suggests patterns from observability best practices.
Simulate a problem scenario:
What you're learning: Kubernetes troubleshooting methodology. Work through the issue together - AI might suggest checking namespaceSelector configuration (a common mistake), while you verify the actual resources in your cluster.
When instrumenting production applications, start with low-cardinality labels (method, endpoint, status code). Adding labels like user_id or request_id creates a new time series for each unique value, which can exhaust Prometheus memory and cause outages. Always review metric cardinality before deploying to production.