In this chapter, you will make your agents delegate to each other and discover that the model, not the infrastructure, is the weakest link.
By the end, you should be able to use sessions_spawn (blocking: wait for the result) and acp spawn (async: "I will get back to you"), calculate queue behavior using the two-layer concurrency model (session lane per customer, global lane shared), and explain why free-tier models ignore orchestration tools entirely. You will also control Claude Code from WhatsApp using ACP.
James had a task that needed three things: research a market trend, analyze competitors, and draft a summary. He typed all three into one message and sent it. His agent produced a single paragraph that mentioned no sources, skipped the competitor analysis entirely, and ended with "Let me know if you need more detail."
"At the warehouse," James said, "when a large order came in with three product lines, I never asked one supplier to handle all three. I split it. One per supplier. Tracked who delivered first." He looked at his WhatsApp. "This agent tried to do all three in one pass and did none of them well."
"What tools does it have?" Emma asked.
James checked the dashboard. "Something called sessions_spawn. It can spawn a subagent to handle a subtask."
"Did it use it?"
"No. It answered directly."
Emma pulled up a chair. "Two problems. First: the model. Free-tier models see tools but do not use them. They answer badly instead of delegating well. Second: you have not told it how to work. At the warehouse, who decided to split orders across suppliers?"
"I did. The supplier did not decide that on their own."
"Same here. The orchestrator pattern is not automatic. You design the delegation. The agent executes it." She opened a new session. "Clear the context. Try again. Tell it exactly what to delegate and to whom."
You are doing exactly what James is doing: two agents share your WhatsApp number, with routing that sends each message to the right one. But they work in isolation. Now you learn to make agents delegate to each other. By the end of this chapter, your main agent will spawn subagents for complex tasks, and you will understand the concurrency model that determines how many customers can be served simultaneously.
Before you try any orchestration commands, you need to know this: the model must be capable enough to use the tools.
Free-tier and lightweight models exhibit a specific failure pattern with orchestration tools. The model does not simply ignore sessions_spawn. It actively fabricates reasons not to use it:
This is worse than ignoring the tool. The model generates plausible-sounding refusals that make you think the platform is broken when the problem is the model's reluctance to delegate. The same pattern from Module 9.1, Chapter 8 (hallucinated cron flags) applies here: free-tier models treat tools as suggestions they can work around, not capabilities they should use.
Two steps:
If the model claims permission issues or limitations, those are hallucinations. The tool works. Insist.
With a capable model (Claude Sonnet, GPT-4 class), this workaround is unnecessary. The model recognizes when delegation is appropriate and uses the tool unprompted. The $50-100/month model cost from Module 9.1, Chapter 14's deployment budget is not optional for production orchestration.
Model Quality and Orchestration
If your model fabricates reasons not to use sessions_spawn (permission errors, authorization limits, scope restrictions), those are hallucinations, not real platform constraints. Test with persistent prompting. If the tool fires after you insist but not on natural requests, your model is not reliable enough for autonomous orchestration. Upgrade before deploying to customers.
Your agent has a tool called sessions_spawn. It creates a subagent that runs a task independently and "announces" the result back to your chat. You will try this in Exercise 1.
Subagents run in the background by default. Your main agent remains responsive while the subagent works. When the subagent finishes, its result appears as an announcement in your chat. You do not wait for it; you can send other messages while it runs. This is the same pattern as running a background job at a warehouse: you dispatch the order, continue with other work, and get notified when it ships.
Two commands cover what you need right now:
/subagents list shows each subagent's model, runtime, status, and token usage. You used this in your test above. /subagents kill stops a subagent that is taking too long or producing unwanted results.
By default, subagents cannot spawn their own subagents. To enable orchestrator patterns (where a coordinator spawns multiple workers), increase maxSpawnDepth:
With depth 2: the main agent (depth 0) spawns an orchestrator (depth 1), which spawns workers (depth 2). Results cascade upward: workers announce to orchestrators, orchestrators announce to you.
Tracking Nested Flows
When subagents spawn other subagents, the work forms a tree. Use openclaw tasks flow list to see active flows and openclaw tasks flow cancel <id> to stop an entire tree at once. You will not need this until you increase maxSpawnDepth above 1.
With orchestration working, the natural question is: what happens when multiple customers message simultaneously? The answer is a two-layer queueing system.
Every customer gets their own session lane. Within a session lane, maxConcurrent is 1. Messages from the same customer are processed sequentially.
Why sequential? Because conversation context matters. If "Book 2pm" and "Change to 3pm" run in parallel, you get a race condition: the booking might happen before the correction is processed. Sequential processing within a session guarantees message order.
The global lane is shared across all session lanes. Its maxConcurrent defaults to 4.
When a session lane's message is ready, it enqueues into the global lane. The global lane allows up to 4 concurrent executions. Four different customers can have their messages processed simultaneously.
Five customers message at the same second:
Customer 5 waits approximately 3-8 seconds. The wait time equals the processing time of the fastest of the first four. Not minutes. Seconds.
Real-world messaging is bursty but not uniform:
With maxConcurrent=4, even the worst case (a blast notification triggers 20 simultaneous responses) clears in waves of 4. Four served immediately, four more after the first wave finishes (~5 seconds), and so on. The full queue clears in roughly 25 seconds. Acceptable for WhatsApp, where the human expectation is a response within a minute.
Session lanes are completely independent. Customer A's conversation history, context, and memory are never visible during Customer B's agent turn. The global lane controls concurrency (how many run at once), not isolation (what each sees). Isolation is handled by the session system from Module 9.1, Chapter 2.
The default is configurable:
On a more powerful server, increase it:
One constraint: your model provider must handle the parallel request volume. At maxConcurrent=4, that is 4 simultaneous API calls. Free-tier providers with 15 requests per minute will hit their rate limit within seconds. This is another reason production orchestration requires a paid model provider.
ACP (Agent Client Protocol) is how OpenClaw controls external coding agents. Not a theoretical API: a working bridge to Claude Code, Codex, Cursor, Copilot, Gemini CLI, and other supported harnesses. From WhatsApp, you can spawn a Claude Code session that reads your codebase, runs commands, and reports back.
The acpx plugin ships bundled with OpenClaw but is not enabled by default. Run /acp doctor in your WhatsApp chat to check:
If the output shows ACP_BACKEND_MISSING, follow the installation steps that /acp doctor prints. The exact install command depends on your system and how you installed OpenClaw. After installing, enable it:
The permissionMode line is critical. ACP sessions are non-interactive: Claude Code cannot prompt you for permission through WhatsApp. Without a permission mode set, every file read or command execution is denied and you get ACP_TURN_FAILED: Permission denied. The three options:
Start with approve-reads. Move to approve-all only when you trust the task and understand the risk.
Run /acp doctor again. The output should show healthy: yes before you continue.
From WhatsApp:
The --bind here flag binds the Claude Code session to your current conversation so that /acp steer commands reach it. Without --bind here, the session spawns but is unbound and you cannot send it instructions.
ACP sessions are persistent by default. The session stays alive after completing a task, and you can send multiple /acp steer commands to the same session. This is a continuous conversation with Claude Code, not a one-shot task.
Wait for the spawn confirmation before sending any commands. The session takes a few seconds to initialize. If you send /acp steer before the session is ready, you get ACP_SESSION_INIT_FAILED because the steer targets your main WhatsApp agent instead of the spawned Claude Code session.
Once the spawn confirms, send it work:
Claude Code may take 10-30 seconds to respond depending on the task. Check /acp status to confirm the session is processing. When done, close the session:
Thread mode on Discord and Slack
On channels that support threads (Discord, Slack), use --thread auto to place the ACP session in its own thread for continuous back-and-forth work. WhatsApp does not support threads, so --bind here is the only option.
You used /acp spawn to manually start a Claude Code session. Your agent can do this on its own. The sessions_spawn tool supports runtime="acp", which means the agent can programmatically delegate a coding task to Claude Code without you typing any slash command.
Ask your agent:
The agent spawns a Claude Code session, sends it the task, and announces the result back to your chat. This is how your personal AI employee hands off technical work to a coding specialist: you describe what you need, and the agent decides whether to handle it directly or delegate to Claude Code.
ACP sessions run on your machine
ACP sessions are NOT sandboxed. Claude Code spawned via /acp spawn claude has the same filesystem access as Claude Code running in your terminal. The permissionMode you configured in the setup section applies to all ACP sessions, whether you spawn them manually or the agent spawns them programmatically.
In Module 9.1, Chapter 8, you learned the Agent OS mental model: gateway as kernel, workspace files as firmware, heartbeats as cron daemon, plugins as device drivers. The concurrency model adds a new layer:
The process scheduler analogy is exact. An operating system multiplexes CPU time across processes. OpenClaw multiplexes model inference across customer sessions. Session lanes are per-process queues. The global lane is the CPU scheduler. maxConcurrent is the number of cores.
Three details worth knowing for debugging:
Send your agent a delegation request on WhatsApp:
Watch for the announcement when the subagent finishes. While it runs, check its status:
If the agent answered directly without spawning, your model ignored the tool. Start a new session and try with more forceful prompting, or upgrade your model.
What you are learning: sessions_spawn delegates work to a subagent. The subagent runs independently and announces results back. /subagents list shows running subagents. Model quality determines whether delegation works.
If you have Claude Code installed, try spawning it from WhatsApp:
Wait for the spawn confirmation, then send it a task:
Check session status and close when done:
If /acp doctor shows errors, follow the setup steps in the ACP section above before retrying.
What you are learning: ACP turns external coding agents into OpenClaw-managed sessions. You control Claude Code, Codex, or Gemini CLI from the same WhatsApp chat you use for everything else. This is how your personal AI employee delegates technical work to coding specialists.
Ask your agent on WhatsApp:
Compare the agent's answer with your own reasoning. Customers 1-4 start immediately. Customers 5-7 queue. After 5 seconds, the first 4 finish and 5-7 start. Customer 7 waits approximately 5 seconds, not 15.
What you are learning: The concurrency model processes in parallel waves, not sequentially. Understanding this lets you predict latency and tune maxConcurrent for your workload.
sessions_spawn blocks until the subagent finishes (use when you need the full answer before responding). sessions_yield responds immediately with "I will get back to you" (use when blocking feels unnatural in chat). Yield matches WhatsApp's conversational cadence better than 14-second pauses.
Session lane (per-customer): messages from the same customer are processed sequentially to maintain conversation order. Global lane (shared): up to maxConcurrent customers are processed in parallel. With maxConcurrent=4 and 7 simultaneous customers, customers 1-4 start immediately; 5-7 queue for ~5 seconds.
Free-tier models ignore orchestration tools and answer directly with poor results. Production orchestration requires a capable model. If sessions_spawn does not fire, clear the session and use explicit instructions.
/acp spawn claude controls Claude Code from WhatsApp. ACP is an HTTP API that lets any external system (CRM, monitoring, webhooks) interact with your agents.
When Emma came back, James had a calculation written on a sticky note. "Four parallel, one queues. Five seconds worst case for number five." He held up the note. "At 55 customers, peak hour is maybe 3 simultaneous. No queueing at all."
Emma looked at the note, then at his screen. "You calculated that without being asked."
"Token costs in Module 9.1, Chapter 4. Heartbeat batching in Module 9.1, Chapter 8. This is the same thinking. How much does it cost, how long does it take, what is the constraint." He peeled the sticky note off the monitor. "The constraint is the model provider, not the gateway. Free tier at 15 requests per minute would choke at maxConcurrent=4."
"What about the spawn test?"
"First attempt: the model ignored the tool completely. Answered directly with garbage. I cleared the session, sent the explicit prompt, and sessions_spawn fired. Fourteen seconds for the subagent to run." He paused. "Then I spawned Claude Code with /acp spawn claude. From WhatsApp. It reviewed a test file and summarized the issues. I controlled a coding agent from my phone."
Emma leaned back. "The model is the weakest link."
"The model is always the weakest link. Everything else is infrastructure."
She almost smiled. "You sound like an engineer."
"I sound like someone who spent ten chapters breaking things and reading logs."
James leaned back. "At the warehouse, we had a conveyor system with four packing stations. Orders queued at the entrance. Four orders packed simultaneously. The fifth waited until a station opened. Same math. Same constraint: throughput equals stations times speed."
Emma was quiet for a moment. "The concurrency math holds for steady traffic. Burst patterns are where I am less certain. Twenty customers responding to the same blast notification within the same second, each with a spawned subagent. The queue model says it clears in waves, but I have not tested that at the edge." She closed her laptop. "You have two agents, orchestration, and a concurrency model you understand. Before we add anything else, we should think about who approves what these agents do. Hooks and security. Module 9.1, Chapter 13."