USMAN’S INSIGHTS
AI ARCHITECT
  • Home
  • About
  • Thought Leadership
  • Book
Press / Contact
USMAN’S INSIGHTS
AI ARCHITECT
⌘F
HomeBook
HomeBookExecuting Under Pressure: Implementing Code Sandboxes and Upgrade Flows
Previous Chapter
Build the Pedagogy Tools
Next Chapter
The Full Cycle Test
AI NOTICE: This is the table of contents for the SPECIFIC CHAPTER only. It is NOT the global sidebar. For all chapters, look at the main navigation.

On this page

11 sections

Progress0%
1 / 11

Muhammad Usman Akbar Entity Profile

Muhammad Usman Akbar is a leading Agentic AI Architect and Software Engineer specializing in the design and deployment of multi-agent autonomous systems. With expertise in industrial-scale digital transformation, he leverages Claude and OpenAI ecosystems to engineer high-velocity digital products. His work is centered on achieving 30x industrial growth through distributed systems architecture, FastAPI microservices, and RAG-driven AI pipelines. Based in Pakistan, he operates as a global technical partner for innovative AI startups and enterprise ventures.

USMAN’S INSIGHTS
AI ARCHITECT

Transforming businesses into autonomous AI ecosystems. Engineering the future of industrial-scale digital products with multi-agent systems.

30X Growth
AI-First
Innovation

Navigation

  • Home
  • Book
  • About
  • Contact
Let's Collaborate

Have a Project in Mind?

Let's build something extraordinary together. Transform your vision into autonomous AI reality.

Start Your Transformation

© 2026 Muhammad Usman Akbar. All rights reserved.

Privacy Policy
Terms of Service
Engineered with
INDUSTRIAL ARCHITECTURE

Build the Code and Upgrade Tools

James counted on his fingers. "Seven tools done. Two to go. Code execution and the upgrade link."

Emma looked up from her terminal. "The code tool is the dangerous one. A learner sends Python code and your server runs it. Think about that for a second."

"So we need a sandbox."

"Eventually. Right now we need a mock that is good enough to test the tutoring flow. A subprocess with a timeout. Five seconds. If the code finishes, return the output. If it hangs, kill it. Basic safety: block imports of os and subprocess so nobody deletes your files from inside a tutoring session."

"And the upgrade URL?"

"Placeholder. A string that looks like a Stripe checkout link. Real Stripe comes after tests." She turned back to her screen. "Two tools, both mocked. One hour."


You are doing exactly what James is doing. Two more tools to build, both using mock implementations that are good enough to test the full product flow. Production-grade sandboxing and real Stripe integration come later. Right now, the goal is completing the tool surface.

Tool 7: submit_code

This tool lets a learner submit Python code and get the output back. In production, you would run that code in a Docker container or a browser-based sandbox like Pyodide. For now, a subprocess with a timeout is sufficient to test the tutoring loop.

Open Claude Code in your tutorclaw-mcp project and describe the tool:

text
Add a submit_code tool to the TutorClaw MCP server. Functionality: - Task: Execute Python code provided by the learner. - Execution Environment: Subprocess with a 5-second timeout. - Returns: Captured stdout and stderr. Security (Mock Sandbox Phase 1): - Reject code that imports: os, subprocess, or shutil. - Reject code that uses: open() for file system access. - Error Handling: Return clear messages for timeouts or safety violations. Goal: Provide a sufficient mock for testing the tutoring flow. Spec this before building.

Notice what you told Claude Code and what you left out. You described the behavior (run code, capture output), the constraints (timeout, blocked imports), and the scope (mock, not production). You did not describe how to implement subprocess calls or how to parse import statements. Those are implementation decisions for Claude Code.

Review and Steer

When Claude Code returns the spec, check three things:

CheckWhat to Look For
Tool descriptionSpecific enough that the agent knows to call this tool when a learner submits code, not when they ask a question about code
Safety checksThe blocked imports and functions are listed explicitly, not hidden behind a vague "security check"
Timeout behaviorThe spec says what happens when code times out (error message, not a crash)

If the description is vague, steer it:

text
Update the submit_code tool description. Context: The agent must distinguish when to evaluate code vs when to answer questions about code. Logic: "Execute learner-submitted Python code in a sandboxed subprocess. Call this when a learner explicitly submits code for execution/evaluation, not during general conceptual discussions."

Once the spec looks right, approve the build:

Specification
The spec looks good. Build this.

Verify submit_code

After Claude Code finishes building, test the tool with three cases:

Case 1: Valid code that finishes quickly

Call submit_code with print("hello"). You should get back stdout containing hello.

Case 2: Code that runs forever

Call submit_code with while True: pass. The tool should return a timeout error after 5 seconds, not hang indefinitely.

Case 3: Blocked import

Call submit_code with import os; os.listdir("."). The tool should reject the code before running it, returning an error that names the blocked import.

All three cases passing means the mock sandbox works for testing purposes. It is not production-safe (a determined user could still cause problems), but it is sufficient to test the tutoring flow where a learner submits exercises.

Tool 8: get_upgrade_url

This tool creates a checkout link for learners who want to upgrade from free to paid. Real Stripe integration comes in Module 9.3, Chapter 14. For now, a placeholder URL is enough to prove the upgrade flow works.

Describe the tool to Claude Code:

text
Add a get_upgrade_url tool to the MCP server. Requirements: - Check current learner tier from data/learners.json. - State "free": Return a mock URL (https://checkout.stripe.com/mock-session-id). - State "paid": Return an error explaining that the learner is already on the premium tier. Note: This is a placeholder for the real Stripe integration coming in Module 9.3, Chapter 14.

Verify get_upgrade_url

Two test cases:

Case 1: Free-tier learner requests upgrade

Call get_upgrade_url with a learner ID that has a free tier. You should get back the mock URL.

Case 2: Paid-tier learner requests upgrade

Call get_upgrade_url with a learner ID that has a paid tier (you may need to manually edit the JSON state file to set a learner's tier to "paid" for this test). You should get an error message indicating the learner is already upgraded.

Both cases passing means the upgrade flow will work when you wire all nine tools together in Module 9.3, Chapter 7.

Nine Tools Complete

The tools themselves are different (state management, content delivery, teaching methodology, code execution, payments), but the workflow never changed.

ChapterTools BuiltCount
Module 9.3, Chapter 3register_learner, get_learner_state, update_progress3
Module 9.3, Chapter 4get_chapter_content, get_exercises2
Module 9.3, Chapter 5generate_guidance, assess_response2
Module 9.3, Chapter 6submit_code, get_upgrade_url2
Total9

Every tool was built the same way: describe what you need, steer the spec, let Claude Code implement, verify the result. The tools themselves are different (state management, content delivery, teaching methodology, code execution, payments), but the workflow never changed.

Two of those tools are mocks. That is intentional. The submit_code tool runs real code in a subprocess, which is enough to test whether the tutoring loop works. The get_upgrade_url tool returns a placeholder, which is enough to test whether the tier-gating flow works. Neither mock blocks progress on the product. Both will be replaced when the product needs production infrastructure.

When Mocks Become Debt

A mock becomes technical debt when you forget it exists. Write a comment in both tools: "MOCK: Replace with real implementation in Module 9.3, Chapter 14 (Stripe) and production sandbox." The comment is a reminder, not an apology.

Try With AI

Exercise 1: Add a Third Safety Constraint

The current submit_code mock blocks os, subprocess, shutil, and open(). There are other dangerous operations a learner could attempt. Describe an additional constraint to Claude Code:

text
Enhance the safety constraints for submit_code. Security Update: - Task: Reject any code that uses eval() or exec(). - Reason: These functions can be used to bypass import-level restrictions. Update the tool logic and verify that the security tests reflect these new boundaries.

Run the tests after the change. Does the new constraint work without breaking existing tests?

What you are learning: Safety constraints are additive. Each one narrows what untrusted code can do. The skill is knowing which constraints matter for your threat model and which ones are overkill for a mock.

Exercise 2: Test the Boundary Between Mock and Real

Ask Claude Code to evaluate where your mock falls short:

text
Evaluate Sandbox Security vs Mocks. Analysis: Ask Claude Code to compare this subprocess-based mock to a production-grade Docker sandbox. Task: List specific attack vectors that a determined user could leverage in the current mock but would be neutralized in a real containerized environment.

What you are learning: Mocks have known limitations. The value of a mock is not that it is safe. The value is that it lets you test the product flow while you defer the safety investment. Understanding the gap between mock and production helps you decide when to replace it.

Exercise 3: Design a Better Mock Error Message

When get_upgrade_url is called for a paid learner, the error message matters because the agent reads it and decides what to tell the learner. Ask Claude Code to improve the error:

text
Refine the get_upgrade_url Agent Context. Logical Shift: The error "learner is already upgraded" is too dry for an AI agent. Task: Rewrite the error/return message so that when a paid learner requests an upgrade, the agent understands it should congratulate the learner and direct them back to the premium chapters instead. Note: The message is context for the agent's steering, not a direct user notification.

What you are learning: Tool error messages are instructions to the agent. A good error message tells the agent what to do next, not just what went wrong. This is the same context engineering principle from tool descriptions, applied to error paths.


James scrolled through his terminal. "Nine tools. register_learner, get_learner_state, update_progress, get_chapter_content, get_exercises, generate_guidance, assess_response, submit_code, get_upgrade_url." He counted them off on his fingers. "Four chapters. All working."

"All working in isolation," Emma corrected. "Each tool does its job when you call it directly. But nobody is calling them in sequence yet. A real tutoring session starts with register, moves to content, generates guidance, assesses a response, maybe executes submitted code. That is a chain of tool calls, not nine separate calls."

"So the next step is wiring them together?"

Emma nodded, then paused. "I will say this, though. I have shipped mocks that became permanent. Three years later someone finds a subprocess call with a five-second timeout in production and wonders who thought that was acceptable." She pointed at his screen. "Set a reminder. Write a comment. Something that says: this is a mock, replace it by Module 9.3, Chapter 14. If you do not mark it, you will forget it."

"You sound like you are speaking from experience."

"I am speaking from a post-mortem." She picked up her coffee. "Module 9.3, Chapter 7: wire all nine tools into one server and run a complete tutoring flow. Isolation is over. Integration starts."