Context Engineer Your Tools

Name: Digital FTEs: Engineering — Achieving 10× Productivity
Author: Muhammad Usman Akbar

Emma set her coffee down and pulled up a screenshot on her phone. A product page. Five-star reviews on top, then a wall of one-star reviews at the bottom.

"This was mine," she said. "An agent product I shipped two years ago. Nine tools, clean architecture, solid tests. It died in eight weeks."

James leaned in. "What happened?"

"Tool selection. The agent called the billing tool when users asked about their learning progress. The word 'account' appeared in both descriptions. Users would say 'where is my account' meaning their learning profile, and the agent would pull up their payment history." She turned the phone off. "Two months. That is how long it lasted. We patched routing logic, we added conditional checks, we built a classifier on top. None of it worked. The fix was always in the descriptions."

"What was wrong with them?"

"They were one line each. One line per tool. Nine tools, nine lines. The agent had to guess from nine vague sentences which tool to call." She looked at his screen. "Your tool descriptions have the same problem. Let me show you."

You are doing exactly what James is doing. You have nine tools in your TutorClaw server and AGENTS.md orchestrating them. The tools work. The tests pass. But the descriptions that tell the agent when to call each tool are still the one-line versions from when you first built them.

In this chapter, you send three ambiguous messages from WhatsApp, watch the agent pick the wrong tool, then rewrite all nine descriptions using a pattern that fixes the problem.

Step 1: Send Three Ambiguous Messages

Open WhatsApp and send these three messages to your TutorClaw agent. Before each one, predict which tool should fire. Then watch the dashboard.

Message 1: "Help me understand where I am in the course"

Should trigger get_learner_state (position in curriculum). But "understand" might trigger generate_guidance and "in the course" might trigger get_chapter_content. Watch which tool badge lights up.

Message 2: "I'm stuck on this concept and need help"

Should trigger generate_guidance (pedagogical support). But "stuck" might trigger get_exercises and "need help" is so generic that any tool could claim it. Watch the dashboard.

Message 3: "Can I try something?"

Should trigger get_exercises (practice). But "try" might trigger submit_code and the vagueness gives the agent nothing to work with. Watch the dashboard.

Record what happened:

Message	Expected Tool	Actual Tool	Correct?
"Help me understand where I am in the course"	get_learner_state	?	?
"I'm stuck on this concept and need help"	generate_guidance	?	?
"Can I try something?"	get_exercises	?	?

At least one should fire the wrong tool. If all three are correct, send increasingly ambiguous messages until you find a miss. Every multi-tool agent has a selection boundary where descriptions blur.

Step 2: The Two-Layer Description Pattern

The problem with one-line descriptions is that they answer only one question: "What does this tool do?" The agent also needs to know: "When should I call this tool instead of the other eight?"

The fix is a two-layer description.

Layer 1 (Short): One sentence for the agent's initial tool selection scan. Precise, no ambiguity. This is the job title on a resume.

Layer 2 (Behavioral): Detailed guidance about WHEN to use this tool, WHEN NOT to use it, and what to do with the result. This is the full job description with responsibilities and boundaries.

Here is what this looks like for get_chapter_content:

Before (one-line):

Specification

Fetch chapter content for the learner.

The agent sees "chapter content" and calls this tool any time someone mentions a chapter. That includes "what chapter am I on?" (wrong tool: that is get_learner_state) and "give me exercises from chapter 3" (wrong tool: that is get_exercises).

After (two-layer):

Specification

Layer 1: Fetch the text content for a specific chapter number.Use ONLY when the learner needs to read course material.Layer 2: Call this tool when the learner explicitly asks to read,study, or review a chapter. NEVER call this when the learner asksabout their progress (use get_learner_state instead). NEVER callthis when the learner wants practice problems (use get_exercisesinstead). After fetching content, pass it to generate_guidanceto create a PRIMM-structured teaching response.

Layer 1 tells the agent what the tool does. Layer 2 tells the agent how to make the right decision.

Step 3: Add NEVER Statements

NEVER statements work because of how agents select tools. With nine tools and an ambiguous message, the agent eliminates poor matches first, then picks from what remains. Positive descriptions tell the agent what qualifies. NEVER statements tell the agent what to eliminate. Elimination narrows the field faster.

For each tool, write at least one NEVER statement that prevents the most common wrong selection:

Tool	NEVER Statement	Why
register_learner	NEVER call this for existing learners.	Prevents re-registration for existing IDs.
get_learner_state	NEVER call this to change the learner's progress.	Read-only tool; prevents confusion with write tools.
update_progress	NEVER call this to check where a learner is.	Write-only tool; prevents confusion with read tools.
get_chapter_content	NEVER call this when the learner wants exercises.	Prevents confusion with get_exercises.
get_exercises	NEVER call this when the learner wants to read material.	Prevents confusion with get_chapter_content.
generate_guidance	NEVER call this to fetch raw content.	Prevents confusion with raw content fetching.
assess_response	NEVER call this for general conversation.	Ensures it only triggers on exercise submissions.
submit_code	NEVER call this for non-code messages.	Prevents triggering on vague conceptual questions.
get_upgrade_url	NEVER call this for paid-tier learners.	Checks tier first to ensure relevant offer.

Each NEVER statement targets one specific confusion. Target the most common wrong selection, not every possible one.

Step 4: Add Cross-Tool References

NEVER statements tell the agent what not to do. Cross-tool references tell it what to do instead. If the agent reaches the wrong tool's description, the reference redirects it to the right one.

get_learner_state:

Specification

If the learner wants to update their progress after completingsomething, use update_progress instead. If the learner wants chaptercontent, use get_chapter_content instead.

generate_guidance:

Specification

If the learner just wants to read chapter material without teachingstructure, use get_chapter_content instead. If the learner wantsexercises to practice, use get_exercises instead.

get_exercises:

Specification

If the learner wants to submit code they already wrote, usesubmit_code instead. If the learner wants to study beforepracticing, use get_chapter_content first.

Cross-references create a navigation map inside your tool descriptions. An agent that lands on the wrong tool can redirect itself without failing the request.

Step 5: Describe the Updates to Claude Code

Open Claude Code in your tutorclaw-mcp project and send this message:

text

Upgrade the tool description architecture for all 9 TutorClaw tools.

Task:
Rewrite every description using the Two-Layer Pattern:
- Layer 1 (Short): A clear, one-sentence job title.
- Layer 2 (Behavioral): Specific triggers, cross-tool redirects (e.g., "Use X instead"), and a mandatory NEVER statement for each tool.

Logic Requirements:
- register_learner: NEVER call for existing learners.
- get_learner_state: NEVER call to update progress (read-only).
- update_progress: NEVER call to check current state (write-only).
- get_chapter_content: NEVER call for exercises or practice.
- get_exercises: NEVER call for reading material.
- generate_guidance: NEVER call for raw content fetching.
- assess_response: NEVER call for general conversation.
- submit_code: NEVER call for non-code messages.
- get_upgrade_url: NEVER call for paid-tier learners.

Deploy these updates across the MCP server.

Claude Code updates the descriptions across the server. It knows where they live because it built the tool registration code.

Step 6: Verify the Fix

Resend the same three messages from Step 1. Same words, same order.

Message 1: "Help me understand where I am in the course" should now fire get_learner_state. The NEVER statements on generate_guidance and get_chapter_content eliminate them as candidates.

Message 2: "I'm stuck on this concept and need help" should now fire generate_guidance. Its description says "Call this when the learner is stuck or needs pedagogical support." NEVER statements on get_exercises and get_chapter_content eliminate them.

Message 3: "Can I try something?" should now fire get_exercises. Its description says "Call this when the learner wants to practice." The NEVER statement on submit_code eliminates it.

Message	Expected Tool	Actual Tool (Before)	Actual Tool (After)	Fixed?
"Help me understand where I am in the course"	get_learner_state	?	?	?
"I'm stuck on this concept and need help"	generate_guidance	?	?	?
"Can I try something?"	get_exercises	?	?	?

If any still miss, read the description for the tool that incorrectly fired. Does it have a NEVER statement for this scenario? Does the correct tool's description have a clear positive match? Refine and test again.

Try With AI

Exercise 1: Find a Fourth Ambiguous Message

Invent a message that could plausibly trigger two or more tools. Test it against your updated descriptions:

Specification

Send this message to Tutor
Claw via Whats
App: "I think I got it,what is next?"Which tool fires? Is it the right one? If not, which tooldescription needs a NEVER statement or cross-reference to fix it?

What you are learning: Real users send messages that do not map cleanly to any single tool. Finding these boundary cases before your users do is the difference between a product that feels intelligent and one that feels broken.

Exercise 2: Audit the Description Lengths

Ask Claude Code to review the balance between Layer 1 and Layer 2 across all nine tools:

text

Conduct a Description Density Audit.

Task:
List all 9 descriptions and calculate the word count balance between Layer 1 and Layer
2.

Constraint:
Flag any tool where the Layer 2 behavioral guidance exceeds 4 concise sentences.

Goal:
Ensure instructions are dense and high-signal, without drowning the agent in verbiage.

What you are learning: Tool descriptions follow the same principle as AGENTS.md: enough context to make the right decision, not so much that the agent drowns in instructions. Brevity is a design constraint, not a limitation.

Exercise 3: Compare Before and After

Ask Claude Code to show you the original one-line descriptions alongside the new two-layer versions:

text

Verify Context Shift (Before vs After).

Analysis:
Compare the original one-line descriptions for get_learner_state, get_chapter_content, and generate_guidance against their new Two-Layer versions.

Question:
Which specific NEVER statements or cross-references the most impactful in reducing tool selection collisions?

What you are learning: Context engineering is not about writing more. It is about writing the right constraints. The NEVER statements and cross-references you added are often fewer than 30 words per tool, but they change the agent's behavior dramatically.

James resent the three messages. He watched the dashboard.

Message 1: get_learner_state. Correct.

Message 2: generate_guidance. Correct.

Message 3: get_exercises. Correct.

"Two layers. NEVER statements. Cross-references." He leaned back. "The agent is not smarter. It just has better directions."

Emma nodded. "That is context engineering. You did not change the model. You did not change the code. You changed the context the model reads before it makes a decision." She picked up her coffee. "Two months. That is how long my product lasted because I thought one-line descriptions were enough. We built routing logic on top, conditional checks, a classifier to pre-sort messages. None of it worked. The fix was always in the descriptions. Thirty words per tool. That is all it needed."

James looked at the nine updated descriptions on his screen. "So the agent reads these every time it gets a message?"

"Every time. Nine descriptions, nine NEVER statements, nine sets of cross-references. The agent scans all of them, eliminates the ones that say NEVER for this scenario, and picks from what remains. The better your descriptions, the smaller the remaining set, the more accurate the selection."

"What is next?"

"You have nine tools. You have AGENTS.md. You have descriptions that actually work." She set her cup down. "You do not have tests. The restart test from Module 9.3, Chapter 3 proved your state tools survive. The WhatsApp test from Module 9.3, Chapter 8 proved your server connects. But you do not have a test suite that proves all nine tools work correctly in every scenario. Module 9.3, Chapter 11: we build that."