James pulled up the OpenClaw dashboard and scrolled through the last five messages he had sent from WhatsApp. The agent had responded to all of them. Nine tools, working, tested. But the dashboard logs told a different story.
For "Teach me about variables," the agent had called get_chapter_content first, then get_learner_state. For "How am I doing?" it called generate_guidance instead of get_learner_state. For "I want to practice loops," it went straight to get_exercises without checking what chapter the learner was on.
"It picks the right tools most of the time," James said. "But the order is wrong. It does not follow a teaching flow."
Emma looked at the logs. "It has nine tools and no instructions. You gave it tools but never told it how to run a tutoring session." She pointed at his project directory. "Write the manual."
"What manual?"
"AGENTS.md. The instruction document that the agent reads before every conversation. Right now the agent knows what each tool does, because the tool descriptions tell it that. But it does not know the workflow: which tool comes first, what depends on what, when to teach versus when to test. AGENTS.md is the workflow."
You are doing exactly what James is doing. Your agent has nine working tools and no protocol for using them. AGENTS.md is the protocol.
In Module 9.3, Chapter 8, you connected TutorClaw to OpenClaw and tested the full flow from WhatsApp. The tools worked individually. But the agent picked tools based on its best guess, not based on a tutoring methodology. This chapter changes that. You write AGENTS.md, place it in your project, and test until the agent follows your protocol.
AGENTS.md is a plain text file that the agent reads before every conversation. It contains natural language instructions that shape how the agent behaves: what to do first, which tools to call in which situations, and what order to follow.
It is not code. There are no functions, no conditionals, no special syntax. You write instructions the way you would write a manual for a new employee on their first day.
Think of it this way. The tool descriptions are job titles: "register_learner registers new learners." AGENTS.md is the employee handbook: "On every new shift, check the roster first. Then check the day's assignments. Then start working."
Send these two messages from WhatsApp and watch the dashboard:
Message 1: "Help me learn"
Message 2: "I want to practice"
Note which tools the agent calls and in what order. You will send the same messages after writing AGENTS.md to see the difference.
Create a new file in your TutorClaw project root called AGENTS.md. You can write it by hand or ask Claude Code to help you structure it. Either way, you are the author: you decide what goes in the manual.
Your AGENTS.md needs three sections: session start protocol, tutoring flow, and tool selection rules.
Every tutoring session must begin the same way. The agent needs to know who the learner is and where they are in the curriculum before it can teach, test, or advise.
Write instructions like this (adapt to your own words):
The key rule: the agent must never call get_chapter_content, generate_guidance, get_exercises, or any teaching tool before it knows the learner's current state. Without state, the agent cannot pick the right chapter, the right PRIMM stage, or the right difficulty level.
Once the agent has the learner's state, it follows the PRIMM-Lite tutoring methodology. Write the flow as a sequence:
This sequence is not rigid. The learner might ask a question that breaks the flow. The agent should handle that and return to the sequence. But the default path is: content, guidance, response, assessment, progress. That is the teaching loop.
Ambiguous messages are where AGENTS.md matters most. A learner who says "help me" could mean many things. Write explicit rules for the most common cases:
These rules handle the cases where tool descriptions alone are not enough. The tool description for get_exercises says "return exercises matched to weak areas," but without the rule above, the agent might call get_chapter_content when the learner says "I want to practice" because both tools relate to learning content.
Tools fail. The agent needs to know what to do when they do. Add a fourth section:
Without error handling instructions, the agent either ignores errors silently or generates a vague apology. These rules give it specific recovery actions for each failure mode.
Save the file in your TutorClaw project root. The exact location matters: the agent reads AGENTS.md from the project directory when the MCP server is connected.
Ask Claude Code to confirm the file is in the right place:
Send the same two messages you sent before writing AGENTS.md.
Watch the dashboard. The expected tool call order:
If the agent calls get_chapter_content before get_learner_state, your session start protocol needs to be more explicit. Add emphasis:
Expected tool call order:
If the agent calls get_chapter_content instead of get_exercises, your tool selection rules need refinement. Make the distinction sharper:
Send a message that is not covered by your rules. Something like "I am confused about chapter 3." Watch what the agent does. If the behavior seems wrong, add a new rule to AGENTS.md and test again.
AGENTS.md is never finished on the first draft. You write it, test it, observe where the agent deviates, and refine. This is the same iterative pattern as tool descriptions: the first version is a guess, testing reveals the gaps, and each revision makes the agent more predictable.
Common refinements after initial testing:
Each refinement is a single sentence or short paragraph added to AGENTS.md. You are not writing code. You are writing clearer instructions.
Your AGENTS.md handles the happy path. What happens when something goes wrong?
What you are learning: Error handling in AGENTS.md is not try/catch blocks. It is written guidance: "If the learner is not found, ask for their name and call register_learner." The agent reads this as instruction, not as code.
Ask Claude Code to analyze the difference your AGENTS.md made:
What you are learning: Context engineering is measurable. You can compare agent behavior before and after AGENTS.md and quantify the improvement. The gaps Claude Code identifies become your next round of refinements.
Imagine you are building a customer support agent with five tools: get_customer_info, check_order_status, create_ticket, escalate_to_human, and send_follow_up. Draft a short AGENTS.md for that agent.
What you are learning: AGENTS.md is not specific to TutorClaw. Any multi-tool agent benefits from an instruction manual. The pattern transfers: identify the workflow, define the start protocol, write rules for ambiguous inputs.
James sent "Help me learn" from WhatsApp and watched the dashboard. get_learner_state first. Then get_chapter_content. Then generate_guidance. The PRIMM "predict" prompt appeared on his phone.
He sent "I want to practice." get_learner_state, then get_exercises. Not get_chapter_content. The agent followed the rule.
"It follows the protocol now," he said. "Same tools, completely different behavior."
Emma nodded. "AGENTS.md is not code. It is context. The agent reads it and makes better decisions." She paused, then added something quieter. "I shipped an agent product once with twelve tools and no AGENTS.md. The agent used the tools in random order depending on how the user phrased their message. Support tickets piled up, all variations of 'the bot gave me exercises when I asked for help.' We wrote AGENTS.md in one afternoon and about eighty percent of those tickets disappeared."
"One afternoon?"
"One afternoon of writing instructions. Not code. Not a model retrain. Just clear sentences explaining the workflow." She pointed at his screen. "Your tools are good. Your descriptions are getting better. But descriptions tell the agent what each tool does. AGENTS.md tells it when and why. That is context engineering."
James looked at his AGENTS.md. Three sections. Maybe forty lines. The most impactful file in the project, and it contained zero code.
"Module 9.3, Chapter 10," Emma said. "Your tool descriptions are next. AGENTS.md handles the big picture: session flow, tool order, routing rules. But the individual tool descriptions still have room to improve. Two-layer descriptions, cross-tool references, and explicit statements about what a tool should never be used for. That is where tool selection gets precise."