You ask an AI system to refactor a database query. It says it's done. You run the application. It crashes. You check the query—it looks completely different from what you expected. When did it change? What steps did it take? What files did it modify? You have no idea. You're flying blind.
This is the observability problem: if you can't see what the AI is doing, you can't debug problems, build trust, or improve the collaboration.
Observability means seeing into the black box. It's understanding what actions the AI took, in what order, with what results. This principle is about making AI workflows transparent, traceable, and debuggable.
Synergy with Principle 3: Observability and Verification are partners. Verification (Principle 3) is the act of checking; Observability (this principle) provides the evidence that makes checking possible. Without observability, verification is guesswork. Observability gives you the map; Verification tells you if you've arrived at the right destination.
The difference: Observability lets you understand the full context of what happened, not just the final result.
You need to see each action the AI took:
Without this, you can't debug. With this, you can trace exactly what happened.
You need to understand the AI's reasoning:
Without rationale, you see changes but not the intent. With rationale, you can evaluate whether the approach makes sense.
Warning: AI Rationalization: AI can sound confident even when wrong. It will give plausible-sounding explanations for broken code. Never trust the rationale alone—always verify with actual results (tests, output, behavior). If the rationale says "this will work" but the tests fail, trust the tests.
You need to see the result of each action:
Without results, you can't verify success. With results, you can confirm the AI achieved what it intended.
Most AI tools provide activity logs. Here's how to read them effectively.
Typical activity log structure:
Success Pattern:
Each step logically follows the previous one. Verification happens after changes.
Warning Pattern:
Multiple edits without verification. No testing. High risk of problems.
Failure Pattern:
AI tried but couldn't solve the problem. Needs human intervention.
Feeling overwhelmed by 50-line logs? Here's how to skim effectively:
Ignore the timestamps. Look only for the verbs: READ, EDIT, TEST, FAIL, COMPLETE.
The red flag: If you see EDIT without TEST after it, that's a problem. The AI changed code but didn't verify it works.
This 10-second scan catches most issues without reading every line.
When something goes wrong, trace through the log:
Here's a realistic scenario. You asked Claude Code to "add input validation to the signup form." It reported success. But users are still submitting invalid data. Let's trace through what happened.
Step 1: Check what actually changed.
Two files changed. That looks right.
Step 2: Check if tests were run.
Look at the session conversation or activity log. Was there a npm test or similar verification step? If you see the change was made and Claude immediately said "Done!" without running tests—that's the warning pattern: EDIT → COMPLETE with no VERIFY.
Step 3: Read the actual diff.
You discover: Claude added validation to the onSubmit handler but the form uses onChange validation. The validation function exists but is never called in the right place.
Step 4: The root cause. Claude didn't understand the form's validation pattern. It wrote correct validation logic in the wrong location. If it had run the form and tested submission, this would have been caught immediately.
The lesson: The 2-Minute Audit (git diff + test run) would have caught this before you shipped it. Observability isn't extra work—it's the work that prevents rework.
When working with AI, design workflows that make actions visible.
The plan makes intentions visible. You can redirect before execution.
Checkpoints let you verify progress incrementally.
The summary provides complete context for review.
Different AI tools provide different observability features.
Activity Logs: .claude/activity-logs/prompts.jsonl
History Panel: Shows all AI interactions in current session
Cmd+K Quick Actions: Contextual suggestions with preview
Copilot Workspace: Full AI project work with visible steps
Fix: Require confirmation/visibility for all operations, not just successes.
Fix: Require rationale with every change. "I changed X because Y."
Fix: Require progress updates for long-running tasks.
There are two ways to observe AI work:
You see actions as they occur. This is your chance to intervene before damage.
Key insight: If you see the AI reading the wrong directory or about to delete the wrong file, don't wait for it to finish. Hit Ctrl+C immediately.
Real-time observation is your first line of defense. Use it.
You review logs after the task completes. This is how you debug problems and learn patterns.
Post-mortem tells you what happened. Real-time lets you prevent it from happening.
Use both: Watch in real-time during the task. Review logs afterward to catch anything you missed.
1. Git History
2. Activity Log Review
Log Query Cheat Sheet: The error filter above is your superpower. When something goes wrong, run that one command first—it cuts through hundreds of log lines to show you exactly what failed.
3. Test Results
Add logging to your AI workflows:
Trust isn't given—it's earned through transparency. When you can see what AI is doing:
Without observability, you're always second-guessing. With it, you can build genuine trust based on evidence.
After every AI task, spend exactly 2 minutes on this checklist:
The catch: If the git diff doesn't match the AI's summary, you've found a "silent failure"—the AI said it did X but actually did Y. These are the dangerous bugs.
Time investment: 2 minutes per task. Payoff: Catches problems before they compound into hours of debugging.
Make this automatic. Every task ends with this audit. No exceptions.
"If you can't see what the agent is doing, you can't fix it when it goes wrong."
Both interfaces provide observability through different mechanisms. Claude Code's advantage is raw terminal transparency—you see every command and every output. Cowork's advantage is the three-panel layout (chat, progress, artifacts) designed for simultaneous visibility.
The principle is the same: Regardless of interface, you need visibility into what the agent is doing. Without it, agents are black boxes. With it, they're debuggable systems you can trust and improve.
For a detailed comparison of how all seven principles map across both interfaces, see Lesson 9: Putting It All Together.
What you're learning: How to read and interpret AI activity logs. You're developing the skill of understanding agent behavior through observation—essential for debugging and building trust.
What you're learning: How to design workflows that are transparent and observable. You're learning to structure AI collaboration so that actions are visible, traceable, and understandable.
What you're learning: How to use observability to debug problems effectively. You're learning to trace issues through logs, understand agent behavior, and identify what additional visibility would help.
Observability is your defense against unexpected behavior. Always review activity logs when something seems wrong. The more you understand what the AI is doing, the better you can direct it and catch problems early.