From Request-Response to Events

Name: Digital FTEs: Engineering — Achieving 10× Productivity
Author: Muhammad Usman Akbar

Your Task API is working beautifully. Users create tasks, update them, mark them complete. Everything runs smoothly in development. Then your team adds requirements that seem straightforward:

When a task is created, send a Slack notification to the team
When a task is completed, update the audit log for compliance
When a task is overdue, trigger a reminder email

You implement these the obvious way: when the Task API creates a task, it calls the Notification Service, waits for a response, then calls the Audit Service, waits for a response, then calls the Reminder Service. Clean, synchronous, easy to understand.

Until the Notification Service has a bad day. Slack rate-limits your API calls. Now every task creation takes 3 seconds instead of 50 milliseconds. Users complain. Your Task API's response time charts look like a heart attack. And here's the worst part: the Task API did nothing wrong. It's just waiting for a service it depends on.

This is the coupling problem. And it's not a code quality issue you can refactor away. It's an architectural constraint baked into how request-response communication works.

The Request-Response Chain

Consider what happens when your Task API creates a task using direct HTTP calls:

text

User Request
    |
    v
+------------------+
|    Task API      |

1. Validate and save task
+------------------+
    |
    | HTTP POST /notifications (blocks waiting)
    v
+------------------+
|  Notification    |

2. Send Slack message
|    Service       |     (Slack rate-limited: 2s delay)
+------------------+
    |
    | HTTP Response (2s later)
    v
+------------------+
|    Task API      |

3. Continue to next call
+------------------+
    |
    | HTTP POST /audit (blocks waiting)
    v
+------------------+
|  Audit Service   |

4. Write audit record
|                  |     (database slow: 500ms)
+------------------+
    |
    | HTTP Response (500ms later)
    v
+------------------+
|    Task API      |

5. Continue to next call
+------------------+
    |
    | HTTP POST /reminders (blocks waiting)
    v
+------------------+
| Reminder Service |

6. Schedule reminder
|                  |     (normal: 50ms)
+------------------+
    |
    | HTTP Response (50ms later)
    v
+------------------+
|    Task API      |

7. Finally respond to user
+------------------+
    |
    v
User Response (2.55 seconds later)

The user asked to create a task. The Task API did its job in 50ms. But the user waited 2.55 seconds because of services that have nothing to do with task creation.

Now imagine the Notification Service crashes entirely. What happens to your Task API?

Cascading Failures

When services are chained through synchronous calls, a failure anywhere becomes a failure everywhere:

text

Failure Cascade Scenario:
1. Notification Service crashes
   |
   v


2. Task API calls Notification Service
   → Connection timeout (30 seconds)
   → Task API thread blocked
   |
   v


3. More users create tasks
   → More Task API threads blocked
   → Thread pool exhausts
   |
   v


4. Task API can't accept new requests
   → Returns 503 Service Unavailable
   |
   v


5. Frontend shows error to all users
   "Unable to create task"

Business Impact: Nobody can create tasks because Slack notifications are down.

This is the cascading failure problem. The Notification Service going down should mean "notifications don't work." Instead, it means "the entire task creation system doesn't work."

Your users don't care about Slack notifications. They care about creating tasks. But your architecture doesn't let them have one without the other.

Three Types of Coupling

The request-response pattern creates three distinct coupling problems. Understanding them separately helps you see why the event-driven solution addresses each one.

Temporal Coupling

Definition: Both services must be running at the same moment for communication to succeed.

Service A wants to	Service B is	Result
Send notification	Running	Success
Send notification	Crashed	Failure
Send notification	Deploying	Failure
Send notification	Rate-limited	Delayed / Failure

Business Impact: If your Notification Service deploys a new version (even a 30-second rolling update), task creation fails during that window. You can't independently deploy services.

Example: Your Task API creates a task at 2:00:03 PM. Your Notification Service restart takes from 2:00:00 PM to 2:00:30 PM. That task creation fails, even though both services work fine 99.99% of the time.

Availability Coupling

Definition: Service A's availability becomes dependent on Service B's availability.

The math is brutal. If each service has 99.9% uptime (3.65 hours downtime per year):

Services in Chain	Combined Availability	Annual Downtime
1	99.9%	8.76 hours
2	99.8%	17.5 hours
3	99.7%	26.3 hours
4	99.6%	35.1 hours

Business Impact: Adding the Notification, Audit, and Reminder services dropped your Task API's effective availability from 99.9% to 99.6%. That's four times more downtime, and you didn't change a single line in your Task API code.

Example: You promise customers 99.9% uptime for task creation. But your architecture makes that promise impossible to keep unless every dependent service also achieves 99.9%.

Behavioral Coupling

Definition: The calling service must know the interface details of the called service.

Your Task API code looks like this:

python

# Task API must know Notification Service's interface
async def create_task(task: TaskCreate):
    task_id = await save_task(task)

    # Coupling: Task API knows notification API structure
    await http_client.post(
        "http://notification-service/api/v1/notifications",
        json={
            "channel": "slack",
            "workspace": "team-tasks",
            "message": f"Task created: {task.title}",
            "priority": "normal"
        }
    )

    # Coupling: Task API knows audit API structure
    await http_client.post(
        "http://audit-service/api/v2/records",
        json={
            "entity_type": "task",
            "entity_id": task_id,
            "action": "created",
            "actor": task.created_by,
            "timestamp": datetime.utcnow().isoformat()
        }
    )
    return task_id

Business Impact:

If Notification Service changes its API from /api/v1/ to /api/v2/, Task API must update and redeploy
If Audit Service adds a required field, Task API must update and redeploy
Task API developers must understand how notification and audit services work

Example: The Audit team decides to add a source_ip field to audit records. Now the Task API team, Notification team, Reminder team, and every other service that writes audit records must coordinate their deployments. One team's API change cascades to every caller.

The Event-Driven Alternative

What if the Task API didn't call other services at all? What if it simply announced what happened and let interested services react independently?

text

Event-Driven Approach:
User Request
    |
    v
+------------------+
|    Task API      |

1. Validate, save task, publish event
+------------------+
    |
    | "Task Created" event
    v
+------------------+
|   Event Stream   |

2. Durable log of events
|     (Kafka)      |     (stored until consumed)
+------------------+
    |             \              \
    v              v              v
+----------+  +----------+  +----------+
| Notif.   |  | Audit    |  | Reminder |
| Service  |  | Service  |  | Service  |
+----------+  +----------+  +----------+
     |             |              |
     v             v              v
   Sends        Writes         Schedules
   Slack        audit          reminder
   message      record         check

User Response: 50ms (Task API's actual time)

The Task API responds immediately. It doesn't wait for notifications, audits, or reminders. It doesn't know those services exist. It just publishes a fact: "Task X was created by User Y at Time Z."

Each downstream service:

Consumes events at its own pace
Operates independently of Task API availability
Doesn't affect Task API if it crashes or slows down

How Events Solve Each Coupling Problem

Coupling Type	Request-Response Problem	Event-Driven Solution
Temporal	Both services must be running simultaneously	Events are stored durably; consumers read when ready
Availability	Chain availability multiplies failure probability	Producer doesn't depend on consumer availability
Behavioral	Caller must know callee's API details	Publisher only knows event schema; consumers decide how to react

Temporal decoupling: If the Notification Service is down when a task is created, the event waits in the stream. When the Notification Service recovers, it reads the event and sends the notification. No failure, just delay.

Availability decoupling: Task API's availability is now independent. It only depends on the event stream (Kafka), which is designed for 99.99%+ availability through replication.

Behavioral decoupling: Task API publishes a TaskCreated event with the task data. It doesn't know or care that Notification Service exists. The Notification Service decides what notifications to send based on the event. If Audit Service needs a new field, it either uses data already in the event or ignores that event—no Task API changes required.

Events Are Immutable Facts

This is the mental model shift: events are not requests. They're facts about things that happened.

Request (Command)	Event (Fact)
"Send a notification"	"Task was created"
"Write an audit record"	"User completed task"
"Schedule a reminder"	"Task is now overdue"
Can fail or be rejected	Already happened, cannot un-happen
Caller expects response	Publisher doesn't wait for consumers

A request says "do this thing." An event says "this thing happened." The difference is fundamental:

You can refuse a request. You can't refuse that something happened.
You can fail to fulfill a request. You can't fail to record a fact.
Requests create dependencies. Facts create opportunities.

When you shift from "Task API tells Notification Service to send a message" to "Task API records that a task was created," you fundamentally change the relationship between services. The Task API is no longer the boss giving orders. It's a journalist reporting facts. Consumers decide what to do with those facts.

When Request-Response Still Makes Sense

Event-driven architecture isn't universally better. Some interactions genuinely require synchronous request-response:

Use Case	Why Request-Response
Authentication	User can't proceed until identity confirmed
Payment processing	Must know if charge succeeded before fulfilling order
Real-time queries	User needs data now, not eventually
Strong consistency	Transaction must complete atomically

The key question: Does the caller need to wait for the result?

Creating a task: No. The task exists regardless of what happens downstream.
Checking account balance: Yes. The user needs the answer immediately.
Completing a purchase: Yes. The user needs to know if it worked.
Sending notifications about a purchase: No. The purchase is complete; notifications are side effects.

Reflect on Your Skill

You built a kafka-events skill in Chapter 1. Test and improve it based on what you learned.

Test Your Skill

text

Using my kafka-events skill, analyze this scenario:
A Task API calls three services directly (Notification, Audit, Reminder).
Each call takes 500ms. What coupling problems exist?
Does my skill identify temporal, availability, and behavioral coupling?

Identify Gaps

Ask yourself:

Did my skill explain the three types of coupling from this lesson?
Did it distinguish between events (immutable facts) and commands (requests)?

Improve Your Skill

If you found gaps:

text

My kafka-events skill is missing coverage of coupling types (temporal, availability, behavioral).
Update it to include when each coupling type appears and how events solve them.

Try With AI

Open your AI companion (Claude, ChatGPT, Gemini) and explore these scenarios.

Prompt 1: Analyze Your Architecture

text

I'm building a system where [describe your system, e.g., "an e-commerce platform where placing an order triggers inventory updates, payment processing, shipping label creation, and customer notifications"].

Help me identify which of these interactions have coupling problems:
- Which have temporal coupling? (both must be up simultaneously)
- Which have availability coupling? (one service failure breaks everything)
- Which have behavioral coupling? (one service knows too much about another)

For each coupling you find, ask me: "Does the caller actually need to wait for this result?"

What you're learning: Applying the three coupling types to your own domain. The AI helps you see your architecture through the lens of coupling analysis.

Prompt 2: Design an Event

text

Take this synchronous call from my system:
[Describe a call, e.g., "After a user places an order, the Order Service calls the Inventory Service to reduce stock quantities"]

Help me redesign this as an event. What fact happened that could be published?
What would consumers need to know?
Challenge me: is this truly a case where event-driven is better, or do I actually need the synchronous response?

What you're learning: Converting request-response patterns to events. The AI pushes back when event-driven might not be the right choice, helping you develop judgment about when to apply each pattern.

Prompt 3: Calculate Availability Impact

text

I have a request-response chain with these services:
- Service A: 99.9% availability
- Service B: 99.5% availability
- Service C: 99.9% availability

Calculate the combined availability. Then help me reason about:
if I move Services B and C to consume events asynchronously, what becomes Service A's effective availability?
What's the business case for making this architectural change?

What you're learning: Quantifying the business impact of coupling. Availability calculations make abstract coupling problems concrete, helping you justify architectural decisions.

Safety Note

As you explore event-driven patterns with AI, remember that architecture decisions have trade-offs. Event-driven systems solve coupling problems but introduce complexity around eventual consistency, event ordering, and debugging. Always validate AI suggestions against your specific business requirements and constraints.