Your Task API is working beautifully. Users create tasks, update them, mark them complete. Everything runs smoothly in development. Then your team adds requirements that seem straightforward:
You implement these the obvious way: when the Task API creates a task, it calls the Notification Service, waits for a response, then calls the Audit Service, waits for a response, then calls the Reminder Service. Clean, synchronous, easy to understand.
Until the Notification Service has a bad day. Slack rate-limits your API calls. Now every task creation takes 3 seconds instead of 50 milliseconds. Users complain. Your Task API's response time charts look like a heart attack. And here's the worst part: the Task API did nothing wrong. It's just waiting for a service it depends on.
This is the coupling problem. And it's not a code quality issue you can refactor away. It's an architectural constraint baked into how request-response communication works.
Consider what happens when your Task API creates a task using direct HTTP calls:
The user asked to create a task. The Task API did its job in 50ms. But the user waited 2.55 seconds because of services that have nothing to do with task creation.
Now imagine the Notification Service crashes entirely. What happens to your Task API?
When services are chained through synchronous calls, a failure anywhere becomes a failure everywhere:
This is the cascading failure problem. The Notification Service going down should mean "notifications don't work." Instead, it means "the entire task creation system doesn't work."
Your users don't care about Slack notifications. They care about creating tasks. But your architecture doesn't let them have one without the other.
The request-response pattern creates three distinct coupling problems. Understanding them separately helps you see why the event-driven solution addresses each one.
Definition: Both services must be running at the same moment for communication to succeed.
Business Impact: If your Notification Service deploys a new version (even a 30-second rolling update), task creation fails during that window. You can't independently deploy services.
Example: Your Task API creates a task at 2:00:03 PM. Your Notification Service restart takes from 2:00:00 PM to 2:00:30 PM. That task creation fails, even though both services work fine 99.99% of the time.
Definition: Service A's availability becomes dependent on Service B's availability.
The math is brutal. If each service has 99.9% uptime (3.65 hours downtime per year):
Business Impact: Adding the Notification, Audit, and Reminder services dropped your Task API's effective availability from 99.9% to 99.6%. That's four times more downtime, and you didn't change a single line in your Task API code.
Example: You promise customers 99.9% uptime for task creation. But your architecture makes that promise impossible to keep unless every dependent service also achieves 99.9%.
Definition: The calling service must know the interface details of the called service.
Your Task API code looks like this:
Business Impact:
Example: The Audit team decides to add a source_ip field to audit records. Now the Task API team, Notification team, Reminder team, and every other service that writes audit records must coordinate their deployments. One team's API change cascades to every caller.
What if the Task API didn't call other services at all? What if it simply announced what happened and let interested services react independently?
User Response: 50ms (Task API's actual time)
The Task API responds immediately. It doesn't wait for notifications, audits, or reminders. It doesn't know those services exist. It just publishes a fact: "Task X was created by User Y at Time Z."
Each downstream service:
Temporal decoupling: If the Notification Service is down when a task is created, the event waits in the stream. When the Notification Service recovers, it reads the event and sends the notification. No failure, just delay.
Availability decoupling: Task API's availability is now independent. It only depends on the event stream (Kafka), which is designed for 99.99%+ availability through replication.
Behavioral decoupling: Task API publishes a TaskCreated event with the task data. It doesn't know or care that Notification Service exists. The Notification Service decides what notifications to send based on the event. If Audit Service needs a new field, it either uses data already in the event or ignores that event—no Task API changes required.
This is the mental model shift: events are not requests. They're facts about things that happened.
A request says "do this thing." An event says "this thing happened." The difference is fundamental:
When you shift from "Task API tells Notification Service to send a message" to "Task API records that a task was created," you fundamentally change the relationship between services. The Task API is no longer the boss giving orders. It's a journalist reporting facts. Consumers decide what to do with those facts.
Event-driven architecture isn't universally better. Some interactions genuinely require synchronous request-response:
The key question: Does the caller need to wait for the result?
You built a kafka-events skill in Chapter 1. Test and improve it based on what you learned.
Ask yourself:
If you found gaps:
Open your AI companion (Claude, ChatGPT, Gemini) and explore these scenarios.
What you're learning: Applying the three coupling types to your own domain. The AI helps you see your architecture through the lens of coupling analysis.
What you're learning: Converting request-response patterns to events. The AI pushes back when event-driven might not be the right choice, helping you develop judgment about when to apply each pattern.
What you're learning: Quantifying the business impact of coupling. Availability calculations make abstract coupling problems concrete, helping you justify architectural decisions.
As you explore event-driven patterns with AI, remember that architecture decisions have trade-offs. Event-driven systems solve coupling problems but introduce complexity around eventual consistency, event ordering, and debugging. Always validate AI suggestions against your specific business requirements and constraints.