Emma set her coffee down and pulled up a screenshot on her phone. A product page. Five-star reviews on top, then a wall of one-star reviews at the bottom.
"This was mine," she said. "An agent product I shipped two years ago. Nine tools, clean architecture, solid tests. It died in eight weeks."
James leaned in. "What happened?"
"Tool selection. The agent called the billing tool when users asked about their learning progress. The word 'account' appeared in both descriptions. Users would say 'where is my account' meaning their learning profile, and the agent would pull up their payment history." She turned the phone off. "Two months. That is how long it lasted. We patched routing logic, we added conditional checks, we built a classifier on top. None of it worked. The fix was always in the descriptions."
"What was wrong with them?"
"They were one line each. One line per tool. Nine tools, nine lines. The agent had to guess from nine vague sentences which tool to call." She looked at his screen. "Your tool descriptions have the same problem. Let me show you."
You are doing exactly what James is doing. You have nine tools in your TutorClaw server and AGENTS.md orchestrating them. The tools work. The tests pass. But the descriptions that tell the agent when to call each tool are still the one-line versions from when you first built them.
In this chapter, you send three ambiguous messages from WhatsApp, watch the agent pick the wrong tool, then rewrite all nine descriptions using a pattern that fixes the problem.
Open WhatsApp and send these three messages to your TutorClaw agent. Before each one, predict which tool should fire. Then watch the dashboard.
Message 1: "Help me understand where I am in the course"
Should trigger get_learner_state (position in curriculum). But "understand" might trigger generate_guidance and "in the course" might trigger get_chapter_content. Watch which tool badge lights up.
Message 2: "I'm stuck on this concept and need help"
Should trigger generate_guidance (pedagogical support). But "stuck" might trigger get_exercises and "need help" is so generic that any tool could claim it. Watch the dashboard.
Message 3: "Can I try something?"
Should trigger get_exercises (practice). But "try" might trigger submit_code and the vagueness gives the agent nothing to work with. Watch the dashboard.
Record what happened:
At least one should fire the wrong tool. If all three are correct, send increasingly ambiguous messages until you find a miss. Every multi-tool agent has a selection boundary where descriptions blur.
The problem with one-line descriptions is that they answer only one question: "What does this tool do?" The agent also needs to know: "When should I call this tool instead of the other eight?"
The fix is a two-layer description.
Layer 1 (Short): One sentence for the agent's initial tool selection scan. Precise, no ambiguity. This is the job title on a resume.
Layer 2 (Behavioral): Detailed guidance about WHEN to use this tool, WHEN NOT to use it, and what to do with the result. This is the full job description with responsibilities and boundaries.
Here is what this looks like for get_chapter_content:
Before (one-line):
The agent sees "chapter content" and calls this tool any time someone mentions a chapter. That includes "what chapter am I on?" (wrong tool: that is get_learner_state) and "give me exercises from chapter 3" (wrong tool: that is get_exercises).
After (two-layer):
Layer 1 tells the agent what the tool does. Layer 2 tells the agent how to make the right decision.
NEVER statements work because of how agents select tools. With nine tools and an ambiguous message, the agent eliminates poor matches first, then picks from what remains. Positive descriptions tell the agent what qualifies. NEVER statements tell the agent what to eliminate. Elimination narrows the field faster.
For each tool, write at least one NEVER statement that prevents the most common wrong selection:
Each NEVER statement targets one specific confusion. Target the most common wrong selection, not every possible one.
NEVER statements tell the agent what not to do. Cross-tool references tell it what to do instead. If the agent reaches the wrong tool's description, the reference redirects it to the right one.
get_learner_state:
generate_guidance:
get_exercises:
Cross-references create a navigation map inside your tool descriptions. An agent that lands on the wrong tool can redirect itself without failing the request.
Open Claude Code in your tutorclaw-mcp project and send this message:
Claude Code updates the descriptions across the server. It knows where they live because it built the tool registration code.
Resend the same three messages from Step 1. Same words, same order.
Message 1: "Help me understand where I am in the course" should now fire get_learner_state. The NEVER statements on generate_guidance and get_chapter_content eliminate them as candidates.
Message 2: "I'm stuck on this concept and need help" should now fire generate_guidance. Its description says "Call this when the learner is stuck or needs pedagogical support." NEVER statements on get_exercises and get_chapter_content eliminate them.
Message 3: "Can I try something?" should now fire get_exercises. Its description says "Call this when the learner wants to practice." The NEVER statement on submit_code eliminates it.
If any still miss, read the description for the tool that incorrectly fired. Does it have a NEVER statement for this scenario? Does the correct tool's description have a clear positive match? Refine and test again.
Invent a message that could plausibly trigger two or more tools. Test it against your updated descriptions:
What you are learning: Real users send messages that do not map cleanly to any single tool. Finding these boundary cases before your users do is the difference between a product that feels intelligent and one that feels broken.
Ask Claude Code to review the balance between Layer 1 and Layer 2 across all nine tools:
What you are learning: Tool descriptions follow the same principle as AGENTS.md: enough context to make the right decision, not so much that the agent drowns in instructions. Brevity is a design constraint, not a limitation.
Ask Claude Code to show you the original one-line descriptions alongside the new two-layer versions:
What you are learning: Context engineering is not about writing more. It is about writing the right constraints. The NEVER statements and cross-references you added are often fewer than 30 words per tool, but they change the agent's behavior dramatically.
James resent the three messages. He watched the dashboard.
Message 1: get_learner_state. Correct.
Message 2: generate_guidance. Correct.
Message 3: get_exercises. Correct.
"Two layers. NEVER statements. Cross-references." He leaned back. "The agent is not smarter. It just has better directions."
Emma nodded. "That is context engineering. You did not change the model. You did not change the code. You changed the context the model reads before it makes a decision." She picked up her coffee. "Two months. That is how long my product lasted because I thought one-line descriptions were enough. We built routing logic on top, conditional checks, a classifier to pre-sort messages. None of it worked. The fix was always in the descriptions. Thirty words per tool. That is all it needed."
James looked at the nine updated descriptions on his screen. "So the agent reads these every time it gets a message?"
"Every time. Nine descriptions, nine NEVER statements, nine sets of cross-references. The agent scans all of them, eliminates the ones that say NEVER for this scenario, and picks from what remains. The better your descriptions, the smaller the remaining set, the more accurate the selection."
"What is next?"
"You have nine tools. You have AGENTS.md. You have descriptions that actually work." She set her cup down. "You do not have tests. The restart test from Module 9.3, Chapter 3 proved your state tools survive. The WhatsApp test from Module 9.3, Chapter 8 proved your server connects. But you do not have a test suite that proves all nine tools work correctly in every scenario. Module 9.3, Chapter 11: we build that."