Harden and Polish

Name: Digital FTEs: Engineering — Achieving 10× Productivity
Author: Muhammad Usman Akbar

Emma pulled up James's TutorClaw on her phone and typed a single character into the WhatsApp chat: an empty string, just a space and send.

The response came back fast: a wall of Python. TypeError, KeyError, a file path from James's laptop, a line number deep inside server.py.

"Your product just leaked its implementation to the user," Emma said, turning the screen toward him.

James squinted at the traceback. "That is just an edge case. Nobody sends an empty message."

"Every user is an edge case generator. A two-year-old borrows a phone and mashes the keyboard. Someone pastes an emoji where a name should go. A learner types Module 9.3, Chapter 9999 because they are curious." She set the phone down. "You built a product that works when inputs are perfect. Now make it work when inputs are not."

You are doing exactly what James is doing. TutorClaw works when everything goes right. Now you make it handle the cases where everything goes wrong.

In this chapter, you send malformed inputs to every tool, observe the failures, describe hardening requirements to Claude Code, add structured logging, and verify the result. By the end, every bad input produces a clear message instead of a crash, and every tool call leaves a structured record in a log file.

Step 1: Send Malformed Inputs

Before fixing anything, see what breaks. Open Claude Code in your tutorclaw-mcp project and ask it to test each tool with bad inputs:

text

Test each of the 9 TutorClaw tools with malformed inputs and show me what happens.

Try these specific cases:
- register_learner with an empty string for the name
- get_learner_state with a learner_id that does not exist
- get_chapter_content with Module 9.3, Chapter number -1
- get_chapter_content with Module 9.3, Chapter number 9999
- submit_code with code that tries to import os
- get_upgrade_url for a learner_id that is not in the system
- assess_response with an empty answer string
- update_progress with a negative confidence value

Run each one and show me the exact error response.

Categorize what you see:

Failure Type	What It Looks Like	Why It Matters
Crash	Python traceback returned to the user	Leaks file paths, line numbers, and variables
Confusing Error	"Error: None" or "KeyError: learner-xyz"	User has no idea what to do differently
Silent Success	Accepts chapter -1 and returns empty content	No signal that input was wrong

Step 2: Describe Hardening to Claude Code

Now describe the fix. Tell Claude Code what "valid" means for each tool:

text

I want to add input validation and clear error handling to all 9 TutorClaw tools.

Rules for ALL tools:
- No empty strings for any required text parameter.
- Return a clear error message that tells the user what was wrong and what to do instead.
- Never expose file paths, line numbers, or internal variable names in error responses.

Specific validation rules:
- register_learner: name must be 1-200 characters, no control characters.
- get_learner_state: if learner_id does not exist, return "Learner not found. Register first with your name."
- get_chapter_content: Module 9.3, Chapter number must be a positive integer within the range of available chapters.
- get_exercises: same chapter range validation
- submit_code: reject any code containing import statements for
  os, sys, subprocess, or shutil (basic safety)
- get_upgrade_url: if learner_id does not exist, return
  "Learner not found" (not a crash)
- submit_code: reject any code containing import statements for os, sys, subprocess, or shutil (basic safety).
- update_progress: confidence must be between 0.0 and 1.0.
- generate_guidance: stage must be one of the valid PRIMM stages.

Wrap all tool handlers in error handling so that unexpected errors return "Something went wrong. Please try again." instead of a traceback.

Step 3: Add Structured Logging

Validation tells users what went wrong. Logging tells you what happened:

text

Add JSON-structured logging to the TutorClaw server. Every tool call should log a JSON object with these fields:

- timestamp: ISO 8601 format
- tool_name: which tool was called
- learner_id: who called it (or "anonymous" if not available)
- parameters: input parameters (redact sensitive values)
- result_status: "success" or "error"
- error_message: the error message if status is "error"
- duration_ms: how long the tool call took in milliseconds

Write logs to data/tutorclaw.log, one JSON object per line. Use Python's built-in logging module.

One JSON object per line (JSONL) means each log entry is a complete, parseable record. You can filter by tool name, find all errors for a specific learner, or calculate average response times.

Step 4: Verify Hardening

Resend the same malformed inputs from Step 1:

text

Run the same malformed input tests from earlier:
- register_learner with empty name
- get_learner_state with a nonexistent learner_id
- get_chapter_content with chapter -1 and 9999
- submit_code with code that imports os
- get_upgrade_url for a nonexistent learner
- assess_response with empty answer
- update_progress with negative confidence

Show me the error response for each one. Then show me the last 10 entries in data/tutorclaw.log.

Compare the results to your initial baseline:

Tool	Before	After
register_learner("")	Python TypeError traceback	"Name is required."
get_chapter_content(-1)	Empty response, no error	"Invalid chapter number."
submit_code("import os")	Code executed successfully	"Code contains restricted imports."

Check the log file. Each malformed input should have produced a structured entry:

Field	Example Value
timestamp	2026-04-04T14:23:01.442Z
tool_name	register_learner
result_status	error
error_message	Name is required. Provide a name between 1 and 200 chars.
duration_ms	2

Step 5: Update the Test Suite

The pytest suite from Module 9.3, Chapters 11-12 tested the happy path. Hardening added new behavior that needs test coverage:

text

Add hardening tests to the pytest suite. For each tool, add tests for:
- Empty string inputs where strings are required.
- Out-of-range numeric values (negative, zero, impossibly large).
- Nonexistent learner_ids.
- Restricted code submissions (import os, import subprocess).

Verify two things:
1. The tool returns a clear error message (not a traceback).


2. The tool does not crash (returns a proper response object).

Run the suite to ensure full coverage:

bash

uv run pytest

Try With AI

Exercise 1: Audit the Error Messages

text

List every error message in the TutorClaw server.

For each one, evaluate:
- Does this message tell the user (1) what went wrong?
- (2) What they should do instead?

Flag any message that fails either test.

What you are learning: A good error message is a tiny piece of documentation. The quality of your error messages determines whether users retry with correct input or simply give up.

Exercise 2: Stress the Logging

text

Execute a series of tool calls:
- Call register_learner 5 times with valid names.
- Call get_chapter_content 3 times (2 valid, 1 invalid).
- Call submit_code with restricted code once.

Afterward, analyze data/tutorclaw.log:
- How many total entries are there?
- How many have result_status "error"?
- What is the average duration_ms?

What you are learning: Structured logs are queryable data. When TutorClaw has real users, you can answer "which tool fails most often?" without adding any new code. The log file is your operations dashboard.

Exercise 3: Design a Log Alert Rule

text

Analyze the structured log format for TutorClaw.
Task:
Suggest three log patterns worth monitoring for production alerts. For each pattern, explain what it would catch and why it matters.

Example pattern: more than 10 errors from the same learner_id
in 5 minutes (possible confused user or automated abuse).

What you are learning: Structured logs are the foundation for monitoring and alerting. Good logging design happens before you actually need the logs.

James resent every malformed input from the morning. Empty names, impossible chapter numbers, restricted imports. Each one came back with a clear sentence telling the user what went wrong and how to fix it.

He opened the log file. Neat rows of JSON, one per line. Timestamp, tool name, learner ID, status, duration. Every call recorded.

"The product feels professional now," he said.

"Professional is a word for 'it does not leak its internals when surprised,'" Emma said. She closed her laptop halfway, then paused. "That is why I care about error messages more than features."

James looked at the log file again. "So we have validation, logging, and tests for both. What is left?"

"Publishing. Your product works. Your product handles surprises. Your product records what happens. Now other people need to be able to install it." She pointed at the ClawHub tab in his browser. "Chapter 20."