USMAN’S INSIGHTS
AI ARCHITECT
  • Home
  • About
  • Thought Leadership
  • Book
Press / Contact
USMAN’S INSIGHTS
AI ARCHITECT
⌘F
HomeBook
HomeBookThe Pedagogy Prototype: Turning Your Agent into a Tutor with PRIMM-Lite
Previous Chapter
Build the Content Tools
Next Chapter
Build the Code and Upgrade Tools
AI NOTICE: This is the table of contents for the SPECIFIC CHAPTER only. It is NOT the global sidebar. For all chapters, look at the main navigation.

On this page

10 sections

Progress0%
1 / 10

Muhammad Usman Akbar Entity Profile

Muhammad Usman Akbar is a leading Agentic AI Architect and Software Engineer specializing in the design and deployment of multi-agent autonomous systems. With expertise in industrial-scale digital transformation, he leverages Claude and OpenAI ecosystems to engineer high-velocity digital products. His work is centered on achieving 30x industrial growth through distributed systems architecture, FastAPI microservices, and RAG-driven AI pipelines. Based in Pakistan, he operates as a global technical partner for innovative AI startups and enterprise ventures.

USMAN’S INSIGHTS
AI ARCHITECT

Transforming businesses into autonomous AI ecosystems. Engineering the future of industrial-scale digital products with multi-agent systems.

30X Growth
AI-First
Innovation

Navigation

  • Home
  • Book
  • About
  • Contact
Let's Collaborate

Have a Project in Mind?

Let's build something extraordinary together. Transform your vision into autonomous AI reality.

Start Your Transformation

© 2026 Muhammad Usman Akbar. All rights reserved.

Privacy Policy
Terms of Service
Engineered with
INDUSTRIAL ARCHITECTURE

Build the Pedagogy Tools

James stared at the WhatsApp response from TutorClaw. He had sent "Teach me about variables" and got back a wall of text: definitions, examples, exercises, all at once.

"It fetches the content and shows it," he said, scrolling through the message. "When I trained new hires at the warehouse, I never did this. I walked them through it. Asked them what they thought would happen before they tried the forklift. Watched them try. Then asked why it went sideways."

Emma looked at the screen. "You just described a pedagogical framework. Predict what happens, run it, investigate why the result was different from the prediction." She pulled up a chair. "That three-step loop has a name: PRIMM-Lite. Build it."


You are doing exactly what James is doing. Your content tools deliver raw material, but they do not teach. In this chapter, you describe two pedagogy tools to Claude Code that turn TutorClaw from a content dump into a tutor that walks learners through material the way James trained warehouse staff.

PRIMM-Lite: The Three-Stage Teaching Loop

Before you describe anything to Claude Code, you need to understand what you are asking it to build. PRIMM-Lite is a simplified teaching methodology with three stages:

StageWhat HappensExample
PredictThe tutor shows code and asks "What do you think this will do?" before revealing the output"Look at this loop. What will it print?"
RunThe tutor reveals the actual output and asks "Does this match your prediction?""Here is what it actually prints. Were you right?"
InvestigateThe tutor asks "Why did the output differ?" or "What would happen if you changed X?""Why did it print 5 instead of 4? What if you changed the range?"

The three stages cycle for each topic. A learner who predicts correctly advances quickly. A learner who predicts incorrectly spends more time in the Investigate stage, building understanding before moving forward.

Two pieces make this work as MCP tools:

  1. generate_guidance takes the learner's current stage and confidence, plus the chapter content, and returns a stage-appropriate prompt
  2. assess_response takes the learner's answer, the current stage, and the expected concepts, and returns a confidence_delta (positive or negative) with feedback

Step 1: Describe generate_guidance to Claude Code

Open Claude Code in your tutorclaw-mcp project. You need to explain PRIMM-Lite as a requirement, not as code. Send this message:

text
I want to add a generate_guidance tool based on the PRIMM-Lite teaching loop. Methodology: - Predict: Show code (no output), ask for a prediction. - Run: Reveal output, ask for comparison. - Investigate: Ask "why" or prompt for code modifications. Requirements: - Inputs: learner_state (current stage + confidence) and chapter_content. - Outputs: 1. content: stage-appropriate teaching text. 2. system_prompt_addition: specific instructions for the agent (e.g., "Wait for prediction before revealing output"). Spec this before building.

The system_prompt_addition field is the key design insight in this tool. The content is what the learner sees. The system_prompt_addition is what the agent follows. This is a tool that returns instructions, not just data.

Review the spec Claude Code produces. The key things to check:

ElementWhat to Verify
Stage handlingDoes the tool produce different output for each of the three stages?
system_prompt_additionDoes each stage return a different instruction? Predict should tell the agent to wait for a prediction. Run should tell the agent to compare. Investigate should tell the agent to ask probing questions.
Confidence awarenessDoes it adjust prompt difficulty based on confidence? A learner at 0.2 confidence needs simpler prompts than one at 0.8
Tool descriptionIs it specific enough that the agent knows when to call this versus get_chapter_content?

If the tool description is vague ("generates teaching stuff"), steer it:

text
Refine the generate_guidance tool description. Logic Update: The current description is too vague. Ensure it explicitly states: "This tool generates stage-appropriate teaching prompts using the PRIMM-Lite methodology. Call this when the agent needs to facilitate a teaching concept, not when the learner asks for raw reading material."

Once the spec looks right, tell Claude Code to build it:

Specification
The spec looks good. Build this.

Run the tests after it finishes:

Specification
uv run pytest

Step 2: Describe assess_response to Claude Code

The second pedagogy tool evaluates whether the learner's answer demonstrates understanding. Send this message:

text
Add an assess_response tool to evaluate learner submissions. Inputs: - answer_text: the learner's response. - primm_stage: the current teaching stage. - expected_concepts: a list of concept tags for validation. Outputs: - confidence_delta: float (-0.3 to 0.3) to adjust learner score. - feedback: specific pointers on what was missed or understood. - recommendation: suggested next action for the agent (not a command). Spec this before building.

Review the spec. Watch for:

  • Does the confidence_delta range make sense? Too wide (like -1.0 to 1.0) makes single answers swing the learner's state too far. Too narrow (like -0.01 to 0.01) makes progress invisible.
  • Does the tool description distinguish this from generate_guidance? One generates prompts, the other evaluates answers. The agent must never confuse them.

Approve and build:

Specification
The spec looks good. Build this.

Run tests:

Specification
uv run pytest

Step 3: Verify Both Tools

Now test the pedagogy tools together. Ask Claude Code to call them in sequence:

text
Conduct a Sequential Integration Test. Sequence: 1. Call generate_guidance (Chapter 1, Predict stage, Confidence 0.5). 2. Call assess_response with a valid answer ("It prints hello world") and expected concepts ("print function"). 3. Call assess_response with a vague answer ("It does stuff"). Evaluation: Verify the system_prompt_addition is correct in step 1 and the confidence_delta reflects the quality difference between step 2 and step 3.

You are looking for three things:

  1. generate_guidance at predict stage returns a prompt that shows code and asks the learner to predict, not a prompt that reveals the answer
  2. assess_response with a reasonable answer returns a positive confidence_delta and encouraging feedback
  3. assess_response with a vague answer returns a negative confidence_delta and feedback that points the learner toward what they missed

If any of these are wrong, describe the problem to Claude Code and steer the fix. The describe-steer-verify cycle from Module 9.2 applies to every tool you build.

Two Tools, One Teaching Method

Step back and look at what these two tools do together. Before this lesson, TutorClaw could store learner state and fetch content. It had a filing cabinet and a bookshelf. Now it has a teaching method.

When the agent needs to teach a concept, it calls generate_guidance to get a stage-appropriate prompt. When the learner responds, it calls assess_response to evaluate the answer. The confidence_delta feeds back into the learner state (via update_progress from Module 9.3, Chapter 3), and the next call to generate_guidance adjusts accordingly. A learner who keeps answering well moves through stages quickly. A learner who struggles gets more support at the current stage.

The agent orchestrates this loop by reading tool descriptions, not by following hardcoded logic. You described the methodology, Claude Code built the implementation, and the agent will use the descriptions to call the right tool at the right time.

Try With AI

Exercise 1: Test All Three Stages

Walk through one complete PRIMM-Lite cycle by calling generate_guidance at each stage:

text
Conduct a Full Loop Comparison. Task: Call generate_guidance three times for the same chapter, cycling through the "predict", "run", and "investigate" stages. Goal: Present the responses side-by-side to ensure each stage produces qualitatively distinct pedagogical steering.

What you are learning: Each stage should produce a qualitatively different response. Predict asks for a prediction. Run reveals the answer and asks for comparison. Investigate probes deeper. If the three responses look similar, the tool's stage handling needs steering.

Exercise 2: Confidence Boundaries

Test what happens at extreme confidence values:

text
Test for Support Scaling. Scenario: Call generate_guidance twice for the same chapter/stage (Predict). 1. Low Confidence: 0.1 2. High Confidence: 0.9 Analysis: Compare the complexity and support level of the prompts. Verify if the tool adjusts its tone and scaffolding based on the learner's proficiency.

What you are learning: A learner at 0.1 confidence needs simpler, more supportive prompts than one at 0.9. If the tool produces identical prompts regardless of confidence, you may want to steer Claude Code to add confidence-aware difficulty scaling.

Exercise 3: Edge Case Responses

Test assess_response with responses that are technically correct but miss the point:

text
Verify Threshold Assessment. Scenario: Call assess_response for "investigate" with expected concepts "loop iteration" and "range function". Test Case A (Vague): "The code runs and gives output." Test Case B (Precise): "The for loop runs 5 times because range(5) generates 0-4, and the variable takes each value in sequence." Task: Compare the confidence_delta and feedback quality. Ensure the tool rewards technical precision over generic participation.

What you are learning: The quality gap between a vague answer and a specific one should produce a meaningful difference in confidence_delta. If both answers return similar scores, the assessment logic is too generous and needs tighter criteria.


James called generate_guidance with a learner in the predict stage. The response came back: a code snippet with the question "What do you think this will print?"

"It teaches instead of dumping." He grinned. "At the warehouse, the new hires who predicted first always learned the safety protocols faster. Something about committing to an answer before you see the result."

Emma nodded. "That commitment is the whole trick. Predict forces the learner to engage before they can passively scroll." She paused. "I shipped a tutor once without any pedagogical framework. Just search and display. Users said it was a search engine with extra steps. We bolted PRIMM onto it in a weekend and retention tripled." She shrugged. "Should have done it from the start."

"So we have state tools, content tools, and now pedagogy tools." James counted on his fingers. "What is left?"

"Two more. A way to run code and a way to get paid." Emma pointed at his screen. "Module 9.3, Chapter 6."