You've built your observability stack. Prometheus collects metrics. Jaeger visualizes traces. Loki aggregates logs. Your Task API endpoints are instrumented, and you can answer questions like "What's our P95 latency?" and "Why did that request fail?"
But something is invisible. Every request to your Dapr-enabled services goes through a sidecar. That sidecar calls Redis for state, Kafka for pub/sub, and other services for invocations. When a request is slow, is it your application code or the Dapr sidecar? When an actor method fails, did the method throw an error or did the state store timeout? When a workflow step takes too long, which activity is the bottleneck?
Without Dapr observability integration, you see your application and you see your infrastructure, but the bridge between them is a black box. You're debugging half the story.
This lesson integrates Dapr's native observability into your existing stack. You'll configure sidecars to export metrics to Prometheus and traces to Jaeger. You'll learn the Dapr-specific metrics that reveal actor and workflow behavior. And you'll connect the dots between your application traces and Dapr's internal operations.
When you deployed Dapr, you gained powerful abstractions: state management, pub/sub, service invocation, actors, workflows. But every abstraction hides complexity, and hidden complexity is hard to debug.
Consider this trace from your Task API:
What happened inside that 45ms? Did your application spend 40ms and Dapr 5ms? Or did your application spend 5ms and Dapr 40ms waiting for Redis? Without Dapr observability, you can't answer this.
With Dapr observability integrated:
Now you know: the bottleneck is Redis, not your code. You can optimize in the right place.
Dapr sidecars expose Prometheus metrics on port 9090 by default. But you need to configure this explicitly and tell Prometheus where to scrape.
The Configuration CRD controls observability for all sidecars that reference it:
Apply it:
Output:
Each field serves a specific purpose:
Your applications must reference this Configuration via annotation:
The critical annotation is dapr.io/config: "dapr-observability". Without it, the sidecar won't export metrics or traces.
Dapr sidecars don't have their own Service objects — they run inside pods alongside your application. A ServiceMonitor won't find them. Use a PodMonitor to scrape pods directly:
Apply and verify:
Output:
Check Prometheus targets:
Output:
The Configuration we created sends traces directly to Jaeger. But in production, you often want traces to flow through an OpenTelemetry Collector for processing, filtering, and routing.
Update your Configuration to point to the collector:
The collector then routes to Jaeger (or any backend). This lets you change backends without touching Dapr configuration.
Dapr Actors have their own metrics that reveal activation patterns, method durations, and pending call queues.
Request rate by actor type and method:
Output:
95th percentile method duration:
Output:
ChatAgent.ProcessMessage is at 45ms P95; GetHistory is 12ms. If ProcessMessage suddenly jumps to 500ms, you know where to investigate.
Pending calls (turn-based concurrency backlog):
Output:
Three calls are waiting. If this number grows continuously, the actor can't keep up with demand.
In Jaeger, search for traces from your Dapr-enabled service. Actor method calls appear as spans:
The trace shows the full flow: HTTP request to actor invocation to state operations. You can see that state operations account for most of the time.
Dapr Workflows orchestrate multi-step processes. Observability reveals which steps are slow, which fail, and how long workflows take end-to-end.
Workflow execution rate by workflow type:
Output:
Activity step duration (identify slow steps):
Output:
CallExternalAPI takes 1.2 seconds at P95. That's your bottleneck.
Workflow failure rate:
Output:
OrderProcessingWorkflow has a 2% failure rate. Drill into traces to find the failing step.
Workflow traces show the full orchestration:
The trace reveals that ProcessPayment dominates workflow duration. Optimize there first.
Your application might already emit its own traces using OpenTelemetry. How do you connect them with Dapr's traces?
Dapr automatically propagates trace context (W3C Trace Context headers) through sidecars. When your app makes an HTTP call to localhost:3500, Dapr extracts the trace context and includes it in downstream operations.
For full correlation, instrument your FastAPI app with OpenTelemetry and export to the same Jaeger instance:
Now your app's spans and Dapr's spans share the same trace ID. In Jaeger, you see the complete picture:
Your code (2ms + 1ms + 1ms = 4ms) versus Dapr (40ms + 5ms = 45ms). Crystal clear.
The Dapr control plane components (dapr-operator, dapr-placement, dapr-sentry) also expose metrics. Monitor them to ensure platform health:
Key system metrics:
Your observability-cost-engineer skill should now include Dapr integration patterns. Test it:
Does your skill produce:
Ask yourself:
If gaps exist:
What you're learning: The complete flow from Dapr configuration to Prometheus/Jaeger integration. The AI helps you understand why sidecars require PodMonitor (no dedicated Service) rather than ServiceMonitor.
What you're learning: Using Dapr-specific metrics and traces to diagnose actor performance. The AI guides you through metrics-then-traces workflow for root cause analysis.
What you're learning: Workflow-specific observability patterns. The AI helps you translate workflow concepts (steps, activities, execution) into PromQL queries and tracing strategies.
Safety note: Dapr observability adds overhead. With samplingRate: "1" (100% tracing), every request generates trace data. In high-throughput production: reduce sampling to 10% or 1%, set resource limits on sidecars via annotations (dapr.io/sidecar-cpu-limit, dapr.io/sidecar-memory-limit), and monitor the observability pipeline itself.