USMAN’S INSIGHTS
AI ARCHITECT
  • Home
  • About
  • Thought Leadership
  • Book
Press / Contact
USMAN’S INSIGHTS
AI ARCHITECT
⌘F
HomeBook
HomeBookThe Routing Zero: Eliminating the $1,000 Infrastructure Tax
Previous Chapter
Stripe Integration Economics
Next Chapter
Agents as Economic Actors
AI NOTICE: This is the table of contents for the SPECIFIC CHAPTER only. It is NOT the global sidebar. For all chapters, look at the main navigation.

On this page

11 sections

Progress0%
1 / 11

Muhammad Usman Akbar Entity Profile

Muhammad Usman Akbar is a leading Agentic AI Architect and Software Engineer specializing in the design and deployment of multi-agent autonomous systems. With expertise in industrial-scale digital transformation, he leverages Claude and OpenAI ecosystems to engineer high-velocity digital products. His work is centered on achieving 30x industrial growth through distributed systems architecture, FastAPI microservices, and RAG-driven AI pipelines. Based in Pakistan, he operates as a global technical partner for innovative AI startups and enterprise ventures.

USMAN’S INSIGHTS
AI ARCHITECT

Transforming businesses into autonomous AI ecosystems. Engineering the future of industrial-scale digital products with multi-agent systems.

30X Growth
AI-First
Innovation

Navigation

  • Home
  • Book
  • About
  • Contact
Let's Collaborate

Have a Project in Mind?

Let's build something extraordinary together. Transform your vision into autonomous AI reality.

Start Your Transformation

© 2026 Muhammad Usman Akbar. All rights reserved.

Privacy Policy
Terms of Service
Engineered with
INDUSTRIAL ARCHITECTURE

Model Guidance Strategy

James pulled up the architecture diagrams from Module 9.4, Chapter 5. Something had been nagging him since the comparison table. Architectures 1, 2, and 3 all had the same component in common: a model router. Free learners got DeepSeek. Paid learners got GPT-5.4 mini. Premium learners got Claude Sonnet. OpenRouter acted as the gateway. Claude Code Router sat inside every NanoClaw container.

"Architecture 4 does not have any of that," he said. "No OpenRouter. No Claude Code Router. No routing logic at all. But the learners still use different models. Where did the routing go?"

Emma shrugged. "It did not go anywhere. It was eliminated. The learner opens OpenClaw, picks whatever model they want, and connects to TutorClaw's MCP server. We have zero control over that choice and zero cost exposure from it."


You are doing exactly what James is doing. You have seen the 37x cost range across models (Module 9.4, Chapter 2) and the four architectures compared (Module 9.4, Chapter 5). Now you are looking at the gap: if you cannot route learners to specific models, what do you do instead?

What Architecture 4 Removes

In Architectures 1 through 3, model routing required dedicated infrastructure:

ComponentRole in RoutingApproximate Monthly Cost
OpenRouter gatewayUnified API for multiple LLM providers$200-400 (API fees + overhead)
Claude Code RouterShim inside NanoClaw containers routing by tier$200-400 (container compute)
NanoClaw containersPer-learner containers running routing logic$100-200 (orchestration)
Routing configurationTier-to-model mapping, fallback logic, monitoringEngineering time
TOTAL ROUTING$500-1,000/month

Architecture 4 removes all four rows. The learner picks their model in OpenClaw. The cost of model routing drops to $0. That $500-1,000/month saving is not a token cost reduction; it is infrastructure that no longer exists.

TutorClaw's Model Guidance Table

Routing is gone, but guidance remains. TutorClaw publishes recommendations in the shim skill's documentation and in the MCP server's structured responses:

Learner's BudgetRecommended ModelExpected QualityApprox. Cost/Day
Tight (Free credits)DeepSeek V3.2 or GPT-5 NanoGood for PRIMM-Lite components$0.01-$0.05
ModerateGPT-5.4 miniSolid code execution feedback$0.05-$0.15
ComfortableClaude Sonnet 4.5Best pedagogical depth$0.15-$0.40
Premium / CorporateClaude Opus 4.6Maximum quality for complex reasoning$0.50-$2.00

This table does not control anything. A learner on the Tight budget can connect Claude Opus if they want. The table is a recommendation, not a gate.

The Calculation: Routing Infrastructure vs Guidance

Compare the monthly cost of the two approaches:

Model StrategyRouting Infra CostOperator LLM TokensTotal Model-Related Cost
Routing (Arch 1-3)$500-1,000/mo$2,000-12,000/mo$2,500-13,000/mo
Guidance (Arch 4)$0$0$0

The savings are not just in tokens. The entire category of routing infrastructure disappears.

Design Your Own Guidance Table

TutorClaw's table works for a tutoring product. To build your own table, you need three inputs:

  1. Core Features: what does it do at minimum? What degrades with weaker models?
  2. User Segments: who are your budget-conscious users? Who will pay for quality?
  3. Model Pricing: the 37x range from Module 9.4, Chapter 2 ($0.40/M tokens to $15/M tokens).

A token budget calculator helps translate user behavior into daily cost:

Daily cost = (exchanges/day) x (avg tokens/exchange) x (price per token)

For TutorClaw, a typical study session involves roughly 30 exchanges at 3,000 tokens per exchange.

  • Claude Sonnet 4.5: 30 x 3,000 x ($15 / 1,000,000) = $1.35/day
  • DeepSeek V3.2: 30 x 3,000 x ($0.40 / 1,000,000) = $0.036/day

Why the MCP Server Makes This Work

The get_pedagogical_guidance tool returns structured responses: step-by-step instructions, concept breakdowns, assessment criteria. These are not open-ended prompts that require a strong model to interpret correctly. They are explicit structures that even a weaker model can follow.

  • Strong Models: Use the structured response as a scaffold for rich, adaptive conversation.
  • Weak Models: Use the same response as a strict template, following steps literally with less embellishment.

The pedagogy still works because the logic comes from the MCP server, not the model. The intelligence lives in the server's structured responses, not in the LLM's general capability.

The Product Design Question

In Architectures 1 through 3, the question was: "How do we minimize our model costs?" In Architecture 4, the operator pays $0 for inference. The question becomes:

"How do we make our pedagogical intelligence valuable enough that learners choose to pay for it regardless of their model costs?"

The answer is to build structured intelligence into the MCP server that makes every model better at teaching your subject.

Try With AI

Exercise 1: Calculate the Learner LLM Cost Distribution

text
Analyze the total LLM cost shift under the Inversion for 16,000 learners. Distribution: - 75% use the Tight tier ($0.03 midpoint) - 19% use the Moderate tier ($0.10 midpoint) - 6% use the Comfortable tier ($0.30 midpoint) Task: Calculate: 1. The average daily LLM cost per learner. 2. The total monthly LLM cost across all 16,000 learners ($ x 30 days). 3. If the operator pays $0, who is paying this total? 4. Compare this total to Architecture 1's $12,000/mo cost. How much money moved from operator to learner?

Exercise 2: Design a Guidance Table

text
Design a model guidance table for a new AI product idea. Task: Pick a product idea and define four budget tiers. For each tier: - Recommend a specific model (name and pricing). - Describe which features work well and which degrade. - Estimate the daily cost based on X interactions per day at Y tokens. Format as a Markdown table: | Budget Level | Recommended Model | Expected Quality | Approx. Cost/Day |

Exercise 3: Compare Routing Infrastructure

text
Trace the request flow under Routing (Arch 2) vs Guidance (Arch 4). Task: List every system a message passes through in Architecture 2 (OpenRouter, containers, etc.). Then list those same steps for Architecture 4. Questions: - Which components are gone? - What replaced them? - Estimate the monthly infrastructure savings (excluding tokens) for removing routing logic.

James sat quietly for a moment. "It is like recommending tools to warehouse workers. I used to manage a distribution center. We told every new hire: get the Milwaukee M18 drill, it handles everything we throw at it. Some guys bought the DeWalt instead because it was cheaper. What mattered was that our standard operating procedures worked regardless of which drill they brought."

Emma nodded. "Exactly. The MCP server is the operating procedure. The model is the drill. The intelligence lives in the procedure, and it makes every drill better at the job."