James opened the Module 9.3, Chapter 2 spec on the left side of his screen. Nine tools, designed on a blank sheet three weeks ago. He opened his terminal on the right side. Nine tools, running, tested, paid, published.
"Everything on the spec is built," he said.
Emma did not move. "Prove it. Walk through every line of that spec and show me where it lives in the code."
James started at the top. register_learner: built in Module 9.3, Chapter 3, JSON persistence, test suite passing. get_learner_state: same chapter, same file, same tests. He kept going. Tool by tool, chapter by chapter, matching the paper description to the running implementation.
"I can account for every one," he said after five minutes.
"Good. Now tell me what is missing."
You are doing exactly what James is doing. Open your Module 9.3, Chapter 2 spec (or the table below) and walk through every design decision you made. Your job: verify that each one became real code.
This table maps every commitment from Chapter 2 to the chapter where you built it and the evidence that it works.
Every row maps to a chapter and a test. Nothing from the original spec was skipped.
Open your own Module 9.3, Chapter 2 notes or scroll back to the tool contracts. Check each tool against your implementation. If you find something that drifted from the original design, note what changed and why.
The product works. But it works locally, for one user, on your machine. Here is what changes when real users show up, and why none of these changes affect the product itself.
Every item in the "What Production Adds" column is an infrastructure upgrade. Not a single item changes a tool interface. The inputs and outputs of register_learner are the same whether the data goes to a JSON file or a PostgreSQL table.
This is the key insight: the tests verify the contract, not the implementation. When you swap the storage layer, the tests still pass because the tool still fulfills its contract. That separation was designed in Module 9.3, Chapter 2 when you wrote the tool contracts.
You can evaluate any agent product with these seven levels. Each level builds on the one before it.
Most tutorials stop at Level 2. TutorClaw reaches Level 7 because a product that crashes when the server goes down is not a product anyone will pay for. The ladder also tells you what to fix first when something breaks. If Level 1 fails, nothing above it matters.
James finished the audit. Every spec item mapped to a chapter and a test. "Nothing is missing," he said. Then he paused. "But it runs on my laptop."
Emma nodded. "A database instead of JSON. A real server instead of localhost. A CDN for the content files. And a real sandbox instead of subprocess. Four infrastructure changes. But the tools stay the same."
"Because the tests verify the contract," James added.
Emma smiled. "You described nine tools on a blank sheet. Claude Code built them. You wrote the tests, the descriptions, the identity, and the shim. You published a product."
James looked at the terminal. All tests green. Dashboard showing nine tools connected. A Stripe webhook log with test payments processed. He had spent three weeks building this, and every piece of it worked.
"How long did it take you to build your first product this way?" he asked.
Emma hesitated. "Longer than you. Because I hand-coded everything. Not because you are faster, but because you spent your time on product decisions, not implementation details."
"What is next?" he asked.
"The quiz," Emma said. "Fifty questions. And after that, does TutorClaw make money? The real question: what does each tutoring session cost you, and is the margin sustainable?"
James pulled up his Stripe test dashboard. The product worked. The economics were next.