James pulled up the architecture diagrams from Module 9.4, Chapter 5. Something had been nagging him since the comparison table. Architectures 1, 2, and 3 all had the same component in common: a model router. Free learners got DeepSeek. Paid learners got GPT-5.4 mini. Premium learners got Claude Sonnet. OpenRouter acted as the gateway. Claude Code Router sat inside every NanoClaw container.
"Architecture 4 does not have any of that," he said. "No OpenRouter. No Claude Code Router. No routing logic at all. But the learners still use different models. Where did the routing go?"
Emma shrugged. "It did not go anywhere. It was eliminated. The learner opens OpenClaw, picks whatever model they want, and connects to TutorClaw's MCP server. We have zero control over that choice and zero cost exposure from it."
You are doing exactly what James is doing. You have seen the 37x cost range across models (Module 9.4, Chapter 2) and the four architectures compared (Module 9.4, Chapter 5). Now you are looking at the gap: if you cannot route learners to specific models, what do you do instead?
In Architectures 1 through 3, model routing required dedicated infrastructure:
Architecture 4 removes all four rows. The learner picks their model in OpenClaw. The cost of model routing drops to $0. That $500-1,000/month saving is not a token cost reduction; it is infrastructure that no longer exists.
Routing is gone, but guidance remains. TutorClaw publishes recommendations in the shim skill's documentation and in the MCP server's structured responses:
This table does not control anything. A learner on the Tight budget can connect Claude Opus if they want. The table is a recommendation, not a gate.
Compare the monthly cost of the two approaches:
The savings are not just in tokens. The entire category of routing infrastructure disappears.
TutorClaw's table works for a tutoring product. To build your own table, you need three inputs:
A token budget calculator helps translate user behavior into daily cost:
Daily cost = (exchanges/day) x (avg tokens/exchange) x (price per token)
For TutorClaw, a typical study session involves roughly 30 exchanges at 3,000 tokens per exchange.
The get_pedagogical_guidance tool returns structured responses: step-by-step instructions, concept breakdowns, assessment criteria. These are not open-ended prompts that require a strong model to interpret correctly. They are explicit structures that even a weaker model can follow.
The pedagogy still works because the logic comes from the MCP server, not the model. The intelligence lives in the server's structured responses, not in the LLM's general capability.
In Architectures 1 through 3, the question was: "How do we minimize our model costs?" In Architecture 4, the operator pays $0 for inference. The question becomes:
"How do we make our pedagogical intelligence valuable enough that learners choose to pay for it regardless of their model costs?"
The answer is to build structured intelligence into the MCP server that makes every model better at teaching your subject.
James sat quietly for a moment. "It is like recommending tools to warehouse workers. I used to manage a distribution center. We told every new hire: get the Milwaukee M18 drill, it handles everything we throw at it. Some guys bought the DeWalt instead because it was cheaper. What mattered was that our standard operating procedures worked regardless of which drill they brought."
Emma nodded. "Exactly. The MCP server is the operating procedure. The model is the drill. The intelligence lives in the procedure, and it makes every drill better at the job."