Your Task API now publishes events when tasks are created. The notification service sends emails. The reminder service schedules follow-ups. The audit service logs everything. Each service works independently, consuming events and doing its job.
Then a user creates a high-priority task that requires team assignment. The workflow becomes: create task, assign to user, send notification, schedule reminder. Each step depends on the previous. What happens if notification fails after the user is already assigned? You can't leave the system half-done—the user expects an assignment notification. But you also can't use a distributed transaction across independent services.
This is where the saga pattern saves you. Instead of trying to make multiple services act as one atomic transaction, you embrace eventual consistency and design explicit compensation steps that undo work when things go wrong. In this chapter, you'll implement a choreography-based saga where services coordinate through events, each knowing how to reverse its own actions.
Traditional databases give you ACID transactions: either all changes commit, or none do. When you try to extend this across multiple services, you hit fundamental problems.
The two-phase commit trap:
The coordinator must lock resources across all services, wait for all to be ready, then commit. If any service is slow or fails, everything blocks. In microservices with network partitions and varying latencies, this approach creates cascading failures.
What we actually need:
The saga pattern trades atomicity for availability. You accept that the system may be temporarily inconsistent, but you guarantee it will reach a consistent state—either fully complete or fully rolled back.
A saga is a sequence of local transactions where each step publishes an event that triggers the next. If a step fails, the saga executes compensation events in reverse order to undo completed work.
Key principle: Every forward action must have a corresponding compensation that can undo it. Not all actions are reversible the same way—you can't "unsend" an email—but you can send a correction or update status to reflect the failure.
There are two ways to coordinate a saga:
Choreography (decentralized): Each service knows the overall flow and reacts to events. No central coordinator.
Orchestration (centralized): A saga coordinator tells each service what to do and tracks progress.
This chapter focuses on choreography because it's more aligned with Kafka's event-driven model and keeps services truly independent. Orchestration has its place for complex workflows with many conditional branches.
Let's implement the task assignment saga with four services. Each service:
First, establish the events that drive the saga:
Output:
The event classes ensure every saga event carries:
The task service starts the saga by creating a task and publishing the initiating event:
Output:
The user service assigns users when tasks are created, or publishes a failure event if assignment fails:
Output (success path):
Output (compensation path):
The notification service sends notifications and can trigger compensations:
Output (success):
Output (failure - triggers compensation):
In choreography, each service only sees its own events. To monitor saga progress, implement a saga state tracker:
Output:
Output (failure scenario):
Not all actions have symmetric rollbacks. Here's how to design compensations for common scenarios:
Compensation design principles:
Idempotency: Compensations may run multiple times if failures occur during compensation. Design them to be safe to repeat.
Store what you need: If reverting requires previous state, store it before the forward action.
Accept semantic compensation: You can't truly undo "sent email"—but you can send a follow-up explaining the situation.
Output (first call):
Output (second call - idempotent):
What happens if compensation itself fails? You need a fallback strategy:
Output:
Here's the complete event flow for the task assignment saga:
You built a kafka-events skill in Chapter 1. Test and improve it based on what you learned.
Ask yourself:
If you found gaps:
Setup: You're designing a saga for an order processing workflow in an e-commerce system.
Prompt 1: Design compensation events
What you're learning: AI helps you think through asymmetric compensations—releasing inventory is straightforward, but refunding a payment has different timing and fee implications than never charging in the first place.
Prompt 2: Handle a tricky compensation scenario
What you're learning: AI collaborates on semantic compensation strategies—you can't reverse email, but you can adjust messaging, send follow-ups, or design the saga to make email the last step so failures earlier don't leave users confused.
Prompt 3: Apply to your domain
What you're learning: AI helps you map the saga pattern to your specific domain, identifying which steps have clean reversals and which need semantic compensation.
Safety note: When testing sagas, use isolated topic names (e.g., test.task.created) so compensation events don't interfere with production data. Always test the compensation path as thoroughly as the success path.