James stared at the WhatsApp response from TutorClaw. He had sent "Teach me about variables" and got back a wall of text: definitions, examples, exercises, all at once.
"It fetches the content and shows it," he said, scrolling through the message. "When I trained new hires at the warehouse, I never did this. I walked them through it. Asked them what they thought would happen before they tried the forklift. Watched them try. Then asked why it went sideways."
Emma looked at the screen. "You just described a pedagogical framework. Predict what happens, run it, investigate why the result was different from the prediction." She pulled up a chair. "That three-step loop has a name: PRIMM-Lite. Build it."
You are doing exactly what James is doing. Your content tools deliver raw material, but they do not teach. In this chapter, you describe two pedagogy tools to Claude Code that turn TutorClaw from a content dump into a tutor that walks learners through material the way James trained warehouse staff.
Before you describe anything to Claude Code, you need to understand what you are asking it to build. PRIMM-Lite is a simplified teaching methodology with three stages:
The three stages cycle for each topic. A learner who predicts correctly advances quickly. A learner who predicts incorrectly spends more time in the Investigate stage, building understanding before moving forward.
Two pieces make this work as MCP tools:
Open Claude Code in your tutorclaw-mcp project. You need to explain PRIMM-Lite as a requirement, not as code. Send this message:
The system_prompt_addition field is the key design insight in this tool. The content is what the learner sees. The system_prompt_addition is what the agent follows. This is a tool that returns instructions, not just data.
Review the spec Claude Code produces. The key things to check:
If the tool description is vague ("generates teaching stuff"), steer it:
Once the spec looks right, tell Claude Code to build it:
Run the tests after it finishes:
The second pedagogy tool evaluates whether the learner's answer demonstrates understanding. Send this message:
Review the spec. Watch for:
Approve and build:
Run tests:
Now test the pedagogy tools together. Ask Claude Code to call them in sequence:
You are looking for three things:
If any of these are wrong, describe the problem to Claude Code and steer the fix. The describe-steer-verify cycle from Module 9.2 applies to every tool you build.
Step back and look at what these two tools do together. Before this lesson, TutorClaw could store learner state and fetch content. It had a filing cabinet and a bookshelf. Now it has a teaching method.
When the agent needs to teach a concept, it calls generate_guidance to get a stage-appropriate prompt. When the learner responds, it calls assess_response to evaluate the answer. The confidence_delta feeds back into the learner state (via update_progress from Module 9.3, Chapter 3), and the next call to generate_guidance adjusts accordingly. A learner who keeps answering well moves through stages quickly. A learner who struggles gets more support at the current stage.
The agent orchestrates this loop by reading tool descriptions, not by following hardcoded logic. You described the methodology, Claude Code built the implementation, and the agent will use the descriptions to call the right tool at the right time.
Walk through one complete PRIMM-Lite cycle by calling generate_guidance at each stage:
What you are learning: Each stage should produce a qualitatively different response. Predict asks for a prediction. Run reveals the answer and asks for comparison. Investigate probes deeper. If the three responses look similar, the tool's stage handling needs steering.
Test what happens at extreme confidence values:
What you are learning: A learner at 0.1 confidence needs simpler, more supportive prompts than one at 0.9. If the tool produces identical prompts regardless of confidence, you may want to steer Claude Code to add confidence-aware difficulty scaling.
Test assess_response with responses that are technically correct but miss the point:
What you are learning: The quality gap between a vague answer and a specific one should produce a meaningful difference in confidence_delta. If both answers return similar scores, the assessment logic is too generous and needs tighter criteria.
James called generate_guidance with a learner in the predict stage. The response came back: a code snippet with the question "What do you think this will print?"
"It teaches instead of dumping." He grinned. "At the warehouse, the new hires who predicted first always learned the safety protocols faster. Something about committing to an answer before you see the result."
Emma nodded. "That commitment is the whole trick. Predict forces the learner to engage before they can passively scroll." She paused. "I shipped a tutor once without any pedagogical framework. Just search and display. Users said it was a search engine with extra steps. We bolted PRIMM onto it in a weekend and retention tripled." She shrugged. "Should have done it from the start."
"So we have state tools, content tools, and now pedagogy tools." James counted on his fingers. "What is left?"
"Two more. A way to run code and a way to get paid." Emma pointed at his screen. "Module 9.3, Chapter 6."