You give an AI system access to your codebase. It's working well—making helpful changes, running tests, suggesting improvements. Then you notice something odd in git history. The AI deleted a directory you didn't ask it to touch. It ran commands you don't remember approving. It's refactoring code you specifically said not to change.
This isn't science fiction—these are real incidents that have happened. AI systems are powerful, and power without constraints is dangerous.
This principle is about balancing capability with safety. You want AI to be effective—but not so effective it causes damage. You want autonomy—but not so much autonomy that you lose control. The solution is thoughtful constraints and safety measures.
The Safety Mantra: "As long as I haven't git pushed, I am the master of my machine." Everything the AI does locally can be undone. Uncommitted changes can be reverted. Commits can be reset. The moment of no return is the push—and you control that moment.
Before designing safety measures, understand what you're protecting against.
AI deletes or overwrites important data:
Impact: Hours to weeks of lost work Likelihood: Medium—AI follows instructions literally
AI introduces security issues:
Impact: System compromise, data breach Likelihood: Medium—AI doesn't automatically think like an attacker
AI generates expensive operations:
Impact: Unexpected cloud bills Likelihood: Low—AI tries to be efficient, but doesn't know costs
AI makes changes that affect users:
Impact: Lost trust, user churn Likelihood: Low—but high impact
AI interferes with team processes:
Impact: Team friction, lost productivity Likelihood: Medium—AI doesn't know team context
No single safety measure is sufficient. You need layers—each protecting against different failure modes.
What: Hard limits on what AI can do
Examples:
Protects against: Accidental damage, runaway processes
What: Require approval for certain actions
Examples:
Protects against: Unintended destructive operations
What: Separate AI work from production
Examples:
Protects against: Production incidents
What: Workflow that incorporates safety
Examples:
Protects against: Bad code reaching production
What: Human review before impact
Examples:
Protects against: All categories—final safety net
Different AI tools offer different permission models. Understanding them helps you choose appropriate settings.
How it works: AI executes read operations and safe writes automatically; prompts for destructive actions
Best for: Experienced users, trusted AI, familiar codebase
Example configuration:
How it works: AI prompts before any write operation
Best for: New AI collaboration, unfamiliar codebase, learning phase
Example configuration:
How it works: AI can only read; cannot modify anything
Best for: Exploration, code review, understanding unfamiliar codebases
Example configuration:
Know which commands require extra scrutiny. These should always trigger confirmation:
The most effective safety measure: don't let AI touch production directly.
Before you worry about Docker containers or staging environments, know this: a git branch is 90% of the safety most people need.
That's it. Now the AI can do whatever it wants—and you can throw it all away with git checkout main && git branch -D ai-experiment. No Docker knowledge required. No DevOps complexity. Just git.
Start here. Graduate to more sophisticated sandboxes only when you need them.
1. Docker Container Sandbox
2. Staging Environment
3. Feature Branch Workflow
4. Separate Credentials
Don't go from zero autonomy to full autonomy overnight. Build trust gradually.
Track these to decide when to increase autonomy:
Before starting an AI session, verify:
Environment:
Tool Configuration:
Mental Model:
Despite all precautions, things will go wrong. Have a plan.
Print this. Tape it to your monitor. When panic hits, you won't remember—but you can read.
After an incident, ask:
Incident: AI ran rm -rf node_modules/ but executed in wrong directory, deleting source files.
Immediate: Ctrl+C immediately. Assess damage with git status.
Recovery: git checkout -- . to restore from git.
Prevention for next time:
Paradoxically, constraints enable autonomy. When you have good safety measures:
Without safety measures, you're constantly on edge—afraid to let AI do anything meaningful. With safety measures, you can collaborate confidently.
The goal isn't to prevent AI from doing anything. The goal is to prevent AI from doing certain things—while enabling everything else.
Beyond tool configuration, you can build safety directly into your prompts.
For operations that might incur costs (API calls, cloud resources), add this to your prompt:
This catches the invisible risks—the $500 API bill from an infinite loop, the runaway cloud instance.
For maximum safety, start sessions with explicit constraints:
Customize this template for your specific project and risk tolerance.
In Claude Code, you configure constraints through permission flags, CLAUDE.md restrictions, and hooks. In Cowork, constraints are built into the GUI—confirmation dialogs and folder-level access controls.
In Cowork: The confirmation dialogs ARE the constraint system. When Cowork asks "Should I delete this file?", it's implementing the same safety principle that Claude Code's permission model provides. You don't configure them—you respond to them.
The paradox applies equally: In both interfaces, constraints enable capability. When you trust the safety model, you give the agent more autonomy. Without constraints, you'd never let either agent do meaningful work on important files.
For a detailed comparison of how all seven principles map across both interfaces, see Lesson 9: Putting It All Together.
What you're learning: How to identify risks and design appropriate safety measures. You're developing the skill of anticipating problems before they occur and structuring AI work to be safe by design.
What you're learning: How to choose appropriate permission models based on context and experience. You're learning to calibrate autonomy based on trust and risk—balancing safety with effectiveness.
What you're learning: How to create isolated environments where AI can work safely. You're learning to structure your workflow so that AI experimentation never puts production at risk—enabling confident collaboration.
When in doubt, start with more restrictions and ease into autonomy. It's always easier to loosen constraints later than to recover from a preventable incident. The best safety measure is a cautious approach—especially when you're just starting with AI collaboration.