Your Task API is running in production. Prometheus tells you the error rate spiked at 3am. Jaeger shows a slow request took 2.3 seconds. But what actually happened? What error message did the user see? What was the request payload that triggered the failure?
Metrics show you THAT something went wrong. Traces show you WHERE in the system it happened. Logs show you WHAT specifically occurred.
This is the needle-in-a-haystack problem of distributed systems. Without centralized logging, you're logging into individual pod shells, grepping through files, and hoping the container hasn't restarted and lost the evidence. With hundreds of pods across multiple nodes, this becomes impossible.
Grafana Loki solves this by aggregating logs from all your containers into a single queryable store. But unlike Elasticsearch (which indexes every word of every log), Loki takes a radically different approach: it indexes only the labels (metadata), not the log content itself. This makes Loki orders of magnitude cheaper to operate while still enabling fast queries when you know what you're looking for.
By the end of this lesson, you'll query logs across your entire cluster with LogQL, implement structured logging that plays well with Loki's architecture, and correlate logs with traces for full-stack debugging.
Traditional log aggregation tools like Elasticsearch create full-text indexes of every log line. This enables powerful searches ("find all logs containing 'timeout'") but comes with significant costs:
Loki's insight: in practice, you rarely need to search "all logs everywhere." You search logs from:
By forcing you to narrow with labels first, then filter content, Loki achieves 10x lower storage costs while remaining fast for real debugging scenarios.
Components:
Install Loki with Promtail using Helm:
Output:
Add Loki as a data source in your existing Grafana instance:
Apply the configuration:
Output:
Check that Promtail is collecting logs:
Output:
LogQL is Loki's query language, designed to feel familiar if you know PromQL. Every query starts with a stream selector (the labels) and optionally adds filters and parsers.
Stream selectors filter logs by label. These are the "index" in Loki:
After selecting streams, filter log content:
Parse structured logs to extract fields:
Count and aggregate log data:
Here are queries you'll use daily:
Output (in Grafana Explore):
Output:
Output:
For Loki's label-based architecture to work well, your application should produce structured logs. JSON is the standard format.
Output (when logging):
The key to debugging distributed systems is connecting logs to traces. When you see a slow span in Jaeger, you want to jump directly to the corresponding logs:
Output (JSON logs with trace correlation):
Now in Grafana, you can:
Loki stores logs in chunks, compressed with gzip. Configure retention to balance storage costs with debugging needs:
Promtail automatically discovers pods and adds Kubernetes labels. Customize for your needs:
Enable logging for your pods with an annotation:
Your observability-cost-engineer skill now needs logging capabilities. Consider:
Test your skill: Ask it to write a LogQL query for a specific debugging scenario. Does it generate correct syntax? Does it know when to use parsers vs line filters?
Identify gaps:
Improve your skill: Add these patterns:
Use your AI companion to deepen your logging expertise.
What you're learning: Translating a debugging scenario into LogQL syntax. You're practicing the stream selector → filter → parser pattern with real constraints.
What you're learning: How to teach AI your constraints so it produces a logging schema that matches your specific needs rather than a generic template.
What you're learning: Cost-aware observability design. AI helps you understand the math behind log volume and teaches you optimization techniques you can apply to future systems.
When querying logs, be mindful of sensitive data. Logs may contain PII, tokens, or secrets that were accidentally logged. Use LogQL filters to avoid displaying sensitive content, and configure Promtail pipelines to redact sensitive patterns before ingestion.