Orchestrate Other Agents

Name: Digital FTEs: Engineering — Achieving 10× Productivity
Author: Muhammad Usman Akbar

What You Will Learn

In this chapter, you will make your agents delegate to each other and discover that the model, not the infrastructure, is the weakest link.

By the end, you should be able to use sessions_spawn (blocking: wait for the result) and acp spawn (async: "I will get back to you"), calculate queue behavior using the two-layer concurrency model (session lane per customer, global lane shared), and explain why free-tier models ignore orchestration tools entirely. You will also control Claude Code from WhatsApp using ACP.

James had a task that needed three things: research a market trend, analyze competitors, and draft a summary. He typed all three into one message and sent it. His agent produced a single paragraph that mentioned no sources, skipped the competitor analysis entirely, and ended with "Let me know if you need more detail."

"At the warehouse," James said, "when a large order came in with three product lines, I never asked one supplier to handle all three. I split it. One per supplier. Tracked who delivered first." He looked at his WhatsApp. "This agent tried to do all three in one pass and did none of them well."

"What tools does it have?" Emma asked.

James checked the dashboard. "Something called sessions_spawn. It can spawn a subagent to handle a subtask."

"Did it use it?"

"No. It answered directly."

Emma pulled up a chair. "Two problems. First: the model. Free-tier models see tools but do not use them. They answer badly instead of delegating well. Second: you have not told it how to work. At the warehouse, who decided to split orders across suppliers?"

"I did. The supplier did not decide that on their own."

"Same here. The orchestrator pattern is not automatic. You design the delegation. The agent executes it." She opened a new session. "Clear the context. Try again. Tell it exactly what to delegate and to whom."

You are doing exactly what James is doing: two agents share your WhatsApp number, with routing that sends each message to the right one. But they work in isolation. Now you learn to make agents delegate to each other. By the end of this chapter, your main agent will spawn subagents for complex tasks, and you will understand the concurrency model that determines how many customers can be served simultaneously.

The Model Limitation

Before you try any orchestration commands, you need to know this: the model must be capable enough to use the tools.

Free-tier and lightweight models exhibit a specific failure pattern with orchestration tools. The model does not simply ignore sessions_spawn. It actively fabricates reasons not to use it:

You ask the agent to spawn a subagent
The model claims it "does not have permission" or that the tool is "limited to specific types of sessions" (both hallucinations)
You push back and ask if it has sessions_spawn
The model admits it has the tool but invents restrictions that do not exist
After two or three rounds of insistence, the model finally calls the tool, and it works

This is worse than ignoring the tool. The model generates plausible-sounding refusals that make you think the platform is broken when the problem is the model's reluctance to delegate. The same pattern from Module 9.1, Chapter 8 (hallucinated cron flags) applies here: free-tier models treat tools as suggestions they can work around, not capabilities they should use.

The Workaround

Two steps:

Clear the session context. Old conversation history containing refusal patterns causes the model to mimic those refusals. Start a new session.
Be direct and persistent. The model may resist on the first attempt. Push back:

text

Do you have the sessions_spawn tool? Use it now. Spawn a subagent
that researches [your topic]. Do not answer directly.

If the model claims permission issues or limitations, those are hallucinations. The tool works. Insist.

With a capable model (Claude Sonnet, GPT-4 class), this workaround is unnecessary. The model recognizes when delegation is appropriate and uses the tool unprompted. The $50-100/month model cost from Module 9.1, Chapter 14's deployment budget is not optional for production orchestration.

Model Quality and Orchestration

If your model fabricates reasons not to use sessions_spawn (permission errors, authorization limits, scope restrictions), those are hallucinations, not real platform constraints. Test with persistent prompting. If the tool fires after you insist but not on natural requests, your model is not reliable enough for autonomous orchestration. Upgrade before deploying to customers.

Subagent Delegation with sessions_spawn

Your agent has a tool called sessions_spawn. It creates a subagent that runs a task independently and "announces" the result back to your chat. You will try this in Exercise 1.

Subagents run in the background by default. Your main agent remains responsive while the subagent works. When the subagent finishes, its result appears as an announcement in your chat. You do not wait for it; you can send other messages while it runs. This is the same pattern as running a background job at a warehouse: you dispatch the order, continue with other work, and get notified when it ships.

Managing Subagents

Two commands cover what you need right now:

Command	Purpose
/subagents list	Show active and recent subagents
/subagents kill #1	Stop a subagent

/subagents list shows each subagent's model, runtime, status, and token usage. You used this in your test above. /subagents kill stops a subagent that is taking too long or producing unwanted results.

Nesting: Orchestrator Patterns

By default, subagents cannot spawn their own subagents. To enable orchestrator patterns (where a coordinator spawns multiple workers), increase maxSpawnDepth:

bash

openclaw config set agents.defaults.subagents.max
Spawn
Depth 2

With depth 2: the main agent (depth 0) spawns an orchestrator (depth 1), which spawns workers (depth 2). Results cascade upward: workers announce to orchestrators, orchestrators announce to you.

Tracking Nested Flows

When subagents spawn other subagents, the work forms a tree. Use openclaw tasks flow list to see active flows and openclaw tasks flow cancel <id> to stop an entire tree at once. You will not need this until you increase maxSpawnDepth above 1.

The Two-Layer Concurrency Model

With orchestration working, the natural question is: what happens when multiple customers message simultaneously? The answer is a two-layer queueing system.

Layer 1: Session Lane (Per-Customer)

Every customer gets their own session lane. Within a session lane, maxConcurrent is 1. Messages from the same customer are processed sequentially.

text

Customer A sends: "Book 2pm tomorrow"
Customer A sends: "Wait, make that 3pm"
Session Lane A: [Book 2pm] → [Change to 3pm]
             ↑ processed first   ↑ waits

Why sequential? Because conversation context matters. If "Book 2pm" and "Change to 3pm" run in parallel, you get a race condition: the booking might happen before the correction is processed. Sequential processing within a session guarantees message order.

Layer 2: Global Lane (Shared)

The global lane is shared across all session lanes. Its maxConcurrent defaults to 4.

text

Session Lane A ──┐
Session Lane B ──┤
Session Lane C ──┼── Global Lane (max 4 concurrent) ──→ Model
Session Lane D ──┤
Session Lane E ──┘   ↑ Customer E waits here

When a session lane's message is ready, it enqueues into the global lane. The global lane allows up to 4 concurrent executions. Four different customers can have their messages processed simultaneously.

What Happens with 5 Simultaneous Customers

Five customers message at the same second:

Each message enters its own session lane
All five session lanes enqueue into the global lane
The global lane accepts the first 4 immediately
Customer 5 queues in the global lane
When one of the 4 finishes (typical agent turn: 3-8 seconds), customer 5 is dequeued

Customer 5 waits approximately 3-8 seconds. The wait time equals the processing time of the fastest of the first four. Not minutes. Seconds.

Scaling to 55 Customers

Real-world messaging is bursty but not uniform:

Time of Day	Active Customers	Simultaneous Messages	Queue Behavior
Morning check	10-15	1-2 at same second	No queueing
Peak hour	20-25	2-3 at same second	Rare, brief queueing
Blast scenario	20+ respond	10-20 at same second	Clears in waves, ~25 sec total

With maxConcurrent=4, even the worst case (a blast notification triggers 20 simultaneous responses) clears in waves of 4. Four served immediately, four more after the first wave finishes (~5 seconds), and so on. The full queue clears in roughly 25 seconds. Acceptable for WhatsApp, where the human expectation is a response within a minute.

No Cross-Contamination

Session lanes are completely independent. Customer A's conversation history, context, and memory are never visible during Customer B's agent turn. The global lane controls concurrency (how many run at once), not isolation (what each sees). Isolation is handled by the session system from Module 9.1, Chapter 2.

Adjusting Concurrency

The default is configurable:

bash

openclaw config get agents.defaults.max
Concurrent

On a more powerful server, increase it:

bash

openclaw config set agents.defaults.max
Concurrent 8

One constraint: your model provider must handle the parallel request volume. At maxConcurrent=4, that is 4 simultaneous API calls. Free-tier providers with 15 requests per minute will hit their rate limit within seconds. This is another reason production orchestration requires a paid model provider.

ACP: Run Claude Code from WhatsApp

ACP (Agent Client Protocol) is how OpenClaw controls external coding agents. Not a theoretical API: a working bridge to Claude Code, Codex, Cursor, Copilot, Gemini CLI, and other supported harnesses. From WhatsApp, you can spawn a Claude Code session that reads your codebase, runs commands, and reports back.

Enable the ACP Plugin

The acpx plugin ships bundled with OpenClaw but is not enabled by default. Run /acp doctor in your WhatsApp chat to check:

bash

/acp doctor

If the output shows ACP_BACKEND_MISSING, follow the installation steps that /acp doctor prints. The exact install command depends on your system and how you installed OpenClaw. After installing, enable it:

bash

openclaw config set plugins.entries.acpx.enabled true
openclaw config set acp.enabled true
openclaw config set plugins.entries.acpx.config.permissionMode approve-reads
openclaw gateway restart

The permissionMode line is critical. ACP sessions are non-interactive: Claude Code cannot prompt you for permission through WhatsApp. Without a permission mode set, every file read or command execution is denied and you get ACP_TURN_FAILED: Permission denied. The three options:

Mode	What it allows
approve-reads	Read files and list directories. Block writes/exec
approve-all	Read, write, and execute commands
deny-all	Block everything (useful for testing setup only)

Start with approve-reads. Move to approve-all only when you trust the task and understand the risk.

Run /acp doctor again. The output should show healthy: yes before you continue.

Spawning an External Agent

From WhatsApp:

bash

/acp spawn claude --bind here

The --bind here flag binds the Claude Code session to your current conversation so that /acp steer commands reach it. Without --bind here, the session spawns but is unbound and you cannot send it instructions.

ACP sessions are persistent by default. The session stays alive after completing a task, and you can send multiple /acp steer commands to the same session. This is a continuous conversation with Claude Code, not a one-shot task.

Wait for the spawn confirmation before sending any commands. The session takes a few seconds to initialize. If you send /acp steer before the session is ready, you get ACP_SESSION_INIT_FAILED because the steer targets your main WhatsApp agent instead of the spawned Claude Code session.

Once the spawn confirms, send it work:

bash

/acp steer Summarize the README.md in my current project

Claude Code may take 10-30 seconds to respond depending on the task. Check /acp status to confirm the session is processing. When done, close the session:

bash

/acp status
/acp close

Thread mode on Discord and Slack

On channels that support threads (Discord, Slack), use --thread auto to place the ACP session in its own thread for continuous back-and-forth work. WhatsApp does not support threads, so --bind here is the only option.

Your Agent Can Spawn Claude Code Too

You used /acp spawn to manually start a Claude Code session. Your agent can do this on its own. The sessions_spawn tool supports runtime="acp", which means the agent can programmatically delegate a coding task to Claude Code without you typing any slash command.

Ask your agent:

text

Review the failing tests in my project and summarize the issues.
Use sessions_spawn with runtime acp to delegate this to Claude Code.

The agent spawns a Claude Code session, sends it the task, and announces the result back to your chat. This is how your personal AI employee hands off technical work to a coding specialist: you describe what you need, and the agent decides whether to handle it directly or delegate to Claude Code.

ACP sessions run on your machine

ACP sessions are NOT sandboxed. Claude Code spawned via /acp spawn claude has the same filesystem access as Claude Code running in your terminal. The permissionMode you configured in the setup section applies to all ACP sessions, whether you spawn them manually or the agent spawns them programmatically.

The Agent OS Mental Model (Expanded)

In Module 9.1, Chapter 8, you learned the Agent OS mental model: gateway as kernel, workspace files as firmware, heartbeats as cron daemon, plugins as device drivers. The concurrency model adds a new layer:

OS Concept	OpenClaw Equivalent
Kernel	Gateway (routes messages, manages sessions)
Firmware	Workspace files (loaded every message)
Cron daemon	Heartbeats and cron jobs
Device drivers	Plugins (TTS, tools, hooks)
Process scheduler	Two-layer concurrency (session + global)
Networking	Channels (WhatsApp, Discord, Telegram)

The process scheduler analogy is exact. An operating system multiplexes CPU time across processes. OpenClaw multiplexes model inference across customer sessions. Session lanes are per-process queues. The global lane is the CPU scheduler. maxConcurrent is the number of cores.

Queue Internals (Brief)

Three details worth knowing for debugging:

Generation tracking: Each lane has a generation counter that increments on gateway restart. Stale tasks from a previous run are ignored. This prevents zombie tasks from corrupting queue state.
Gateway draining: When the gateway shuts down, new enqueue attempts fail immediately instead of silently queuing work that will never execute.
Wait callbacks: Tasks that sit in queue beyond 2 seconds trigger a warning. This is how the gateway detects congestion before customers notice.

Try With AI

Exercise 1: Spawn a Subagent

Send your agent a delegation request on WhatsApp:

text

Use sessions_spawn to research the top 3 trends in AI agent
platforms for

2026. Spawn a subagent for this task.

Watch for the announcement when the subagent finishes. While it runs, check its status:

bash

/subagents list

If the agent answered directly without spawning, your model ignored the tool. Start a new session and try with more forceful prompting, or upgrade your model.

What you are learning: sessions_spawn delegates work to a subagent. The subagent runs independently and announces results back. /subagents list shows running subagents. Model quality determines whether delegation works.

Exercise 2: Spawn Claude Code via ACP

If you have Claude Code installed, try spawning it from WhatsApp:

bash

/acp spawn claude --bind here

Wait for the spawn confirmation, then send it a task:

bash

/acp steer Summarize the README.md in my current project

Check session status and close when done:

bash

/acp status
/acp close

If /acp doctor shows errors, follow the setup steps in the ACP section above before retrying.

What you are learning: ACP turns external coding agents into OpenClaw-managed sessions. You control Claude Code, Codex, or Gemini CLI from the same WhatsApp chat you use for everything else. This is how your personal AI employee delegates technical work to coding specialists.

Exercise 3: Understand the Queue

Ask your agent on WhatsApp:

text

If I have maxConcurrent set to 4 and 7 customers message at the
same second, each taking 5 seconds to process, how long does
customer 7 wait? Explain the two-layer concurrency model.

Compare the agent's answer with your own reasoning. Customers 1-4 start immediately. Customers 5-7 queue. After 5 seconds, the first 4 finish and 5-7 start. Customer 7 waits approximately 5 seconds, not 15.

What you are learning: The concurrency model processes in parallel waves, not sequentially. Understanding this lets you predict latency and tune maxConcurrent for your workload.

What You Should Remember

Two Delegation Patterns

sessions_spawn blocks until the subagent finishes (use when you need the full answer before responding). sessions_yield responds immediately with "I will get back to you" (use when blocking feels unnatural in chat). Yield matches WhatsApp's conversational cadence better than 14-second pauses.

Two-Layer Concurrency

Session lane (per-customer): messages from the same customer are processed sequentially to maintain conversation order. Global lane (shared): up to maxConcurrent customers are processed in parallel. With maxConcurrent=4 and 7 simultaneous customers, customers 1-4 start immediately; 5-7 queue for ~5 seconds.

The Model Is the Weakest Link

Free-tier models ignore orchestration tools and answer directly with poor results. Production orchestration requires a capable model. If sessions_spawn does not fire, clear the session and use explicit instructions.

ACP: Control Any Agent from Anywhere

/acp spawn claude controls Claude Code from WhatsApp. ACP is an HTTP API that lets any external system (CRM, monitoring, webhooks) interact with your agents.

When Emma came back, James had a calculation written on a sticky note. "Four parallel, one queues. Five seconds worst case for number five." He held up the note. "At 55 customers, peak hour is maybe 3 simultaneous. No queueing at all."

Emma looked at the note, then at his screen. "You calculated that without being asked."

"Token costs in Module 9.1, Chapter 4. Heartbeat batching in Module 9.1, Chapter 8. This is the same thinking. How much does it cost, how long does it take, what is the constraint." He peeled the sticky note off the monitor. "The constraint is the model provider, not the gateway. Free tier at 15 requests per minute would choke at maxConcurrent=4."

"What about the spawn test?"

"First attempt: the model ignored the tool completely. Answered directly with garbage. I cleared the session, sent the explicit prompt, and sessions_spawn fired. Fourteen seconds for the subagent to run." He paused. "Then I spawned Claude Code with /acp spawn claude. From WhatsApp. It reviewed a test file and summarized the issues. I controlled a coding agent from my phone."

Emma leaned back. "The model is the weakest link."

"The model is always the weakest link. Everything else is infrastructure."

She almost smiled. "You sound like an engineer."

"I sound like someone who spent ten chapters breaking things and reading logs."

James leaned back. "At the warehouse, we had a conveyor system with four packing stations. Orders queued at the entrance. Four orders packed simultaneously. The fifth waited until a station opened. Same math. Same constraint: throughput equals stations times speed."

Emma was quiet for a moment. "The concurrency math holds for steady traffic. Burst patterns are where I am less certain. Twenty customers responding to the same blast notification within the same second, each with a spawned subagent. The queue model says it clears in waves, but I have not tested that at the edge." She closed her laptop. "You have two agents, orchestration, and a concurrency model you understand. Before we add anything else, we should think about who approves what these agents do. Hooks and security. Module 9.1, Chapter 13."