Your task processing system handles thousands of operations daily. Most complete successfully. But what happens when step 3 of a 5-step workflow fails? Do you leave step 1 and 2 in an inconsistent state? What about a health monitoring job that needs to run forever, checking service status every 5 minutes? Can a workflow really run for months without running out of memory?
These are the problems that saga and monitor patterns solve. The saga pattern ensures transactional consistency across distributed operations without traditional database transactions. The monitor pattern creates eternal workflows that can run indefinitely without accumulating unbounded history. Together with human interaction patterns, they handle the complex, long-running scenarios that real agent systems encounter.
Traditional database transactions follow ACID properties: if any step fails, everything rolls back automatically. But in distributed systems, each step might touch a different service, each with its own database. There's no global transaction coordinator.
The saga pattern solves this by recording compensating actions as you go. For each step that succeeds, you remember how to undo it. If a later step fails, you execute those compensations in reverse order.
Consider an order processing workflow:
If shipping fails, you must undo in reverse: first cancel shipment (nothing to cancel, it failed), then refund payment, then release inventory. If you compensated in forward order, you'd release inventory before refunding payment, potentially allowing someone else to buy inventory before the refund completes.
Here's a task processing saga that handles failures gracefully:
Some workflows need to run forever: health monitors, SLA checkers, quota enforcers. Using a while True: loop is an anti-pattern because each iteration adds to the workflow history.
The continue_as_new method solves this. It restarts the workflow from the beginning with new state, discarding the accumulated history.
Real workflows often need human input: approvals, reviews, decisions. Your workflow pauses, waiting for an external event, ideally with a timeout.
Does your dapr-deployment skill understand saga and monitor patterns?
If your skill covers reverse compensation and continue_as_new for memory management, it's working correctly.
Prompt 2: Implement an Eternal Monitor
Prompt 3: Build an Approval Workflow
Safety Note: Compensation logic is critical for data consistency. Test your compensations thoroughly for idempotency. Monitor patterns that run eternally can accumulate operational costs; ensure your health checks are appropriately tuned.