You've spent nine lessons learning the physics of context engineering. Position sensitivity. Attention budgets. Signal versus noise. Progress files. Memory injection. Context isolation. These aren't abstract concepts anymore. You understand WHY Claude forgets things, WHEN to compact versus clear, and HOW to structure information for maximum attention allocation.
Now it's time to put it all together.
This lesson is the capstone. You'll learn decision frameworks that tell you WHICH technique to apply WHEN. You'll understand how to allocate your context budget across components. And you'll build something real: a production-quality agent that demonstrates every technique from this chapter working in concert.
More importantly, you'll connect everything back to the thesis that drives this entire book.
In Chapter 1, you learned the Digital FTEs: Engineering paradigm: domain experts manufacturing Digital FTEs powered by AI. The thesis:
General Agents BUILD Custom Agents.
You've spent three chapters learning the tools (Chapter 3) and the physics (this chapter). Now answer the question that actually matters:
What separates a $50/month agent from a $5,000/month agent?
Not the model. Every law firm, marketing agency, research lab, and consulting practice has access to Claude, GPT, and Gemini. Your competitor can spin up the same frontier model in minutes.
Not the basic prompts. Prompts plateau quickly. You can only polish a single instruction so much before diminishing returns set in.
The context engineering discipline.
A well-engineered agent:
This is what clients pay for. Whether you're building a contract review assistant, a campaign planning agent, a literature synthesis tool, or a code reviewer—the value isn't raw intelligence. Everyone has access to that. Clients pay for reliability, consistency, and domain expertise that accumulates rather than resets.
This chapter gave you the quality control toolkit. This lesson shows you how to apply it.
When you encounter a context problem, you need a decision framework. Not "try everything and see what works" but "diagnose the specific problem and apply the specific solution."
Here's the decision tree:
Each branch points to a specific lesson and technique. The decision tree isn't "which sounds good"—it's "what does the diagnosis say?"
Let's trace through realistic scenarios across different domains.
Scenario A: Legal — Contract Review Assistant
You're reviewing a complex commercial lease agreement. Your AI assistant has been running for 45 minutes, and it's starting to miss liability clauses it caught earlier.
Decision 1: Is context > 70%?
Run /context. Output: Context: 156,000 / 200,000 tokens (78%)
Yes. You're in the orange zone. Compaction needed.
Decision 2: Is the task complete?
No—you're mid-review. You don't want to lose the context of which sections have been analyzed and what red flags were identified.
Action: /compact Focus on the lease review findings and clauses already analyzed. Discard the tangent about formatting preferences from messages 12-18.
Now you're back to ~40% utilization with the important context preserved.
Decision 3: Is this multi-session?
No—you'll finish this contract today.
No action needed for progress files.
Decision 4: Is workflow drifting?
You started reviewing for "indemnification and liability risks" but Claude has shifted to "general grammar corrections." That's drift.
Action: You could implement memory injection, but for a single session, a simpler fix works: re-state your goal explicitly. "Return focus to liability and indemnification clauses. Grammar is out of scope for this review."
Scenario B: Marketing — Campaign Planning Assistant
You're developing a product launch campaign. The AI has been helping for an hour, brainstorming messaging across multiple channels. Now its suggestions are becoming generic, losing the brand voice you established earlier.
Decision 1: Is context > 70%?
Run /context. Output: Context: 145,000 / 200,000 tokens (72%)
Yes, approaching the danger zone. Proactive compaction recommended.
Decision 2: Is the task complete?
No—you've done social media but still need email sequences and landing page copy.
Action: /compact Preserve: brand voice guidelines, approved messaging themes, target persona details. Discard: rejected tagline brainstorms, competitor research tangent.
Decision 3: Is this multi-session?
Yes—this campaign will take several days to develop fully.
Action: Create a progress file:
Decision 4: Is workflow drifting?
The suggestions lost your brand's distinctive voice. That's knowledge drift.
Action: Memory injection would help for future sessions. For now, re-inject the brand voice document explicitly: "Review our brand voice guidelines before the next suggestion. All copy must match this tone."
Scenario C: Research — Literature Synthesis Assistant
You're synthesizing 30 papers for a systematic review. The AI has been helping categorize findings, but it's starting to misattribute claims to the wrong papers.
Decision 1: Is context > 70%?
Yes—you've loaded substantial paper summaries.
Decision 2: Is the task complete?
No—you're at paper 18 of 30.
Action: /compact Preserve: synthesis table with paper citations, methodology classification schema. Discard: detailed quotes from papers 1-10 (keep citations only).
Decision 3: Is this multi-session?
Definitely—this is a week-long project.
Action: Your progress file should track which papers have been analyzed, key findings extracted, and where conflicts exist between sources.
Decision 4: Is workflow drifting?
Attribution errors suggest memory pollution from overlapping contexts.
Action: Consider multi-round processing—analyze 5 papers per session, produce a mini-synthesis, then combine syntheses in a final session with fresh context.
Scenario D: Consulting — Proposal Development Assistant
You're developing a consulting proposal. The AI has been helping with the executive summary, scope definition, and pricing rationale. Now it's suggesting deliverables that don't match what you discussed with the client.
Decision 1: Is context > 70%?
Run /context. Moderate usage—not the primary issue.
Decision 2: Is the task complete?
No—you still need the implementation timeline and team bios section.
No compaction needed yet.
Decision 3: Is this multi-session?
Yes—proposals typically span multiple working sessions with client feedback loops.
Action: Create a progress file capturing: client requirements (from discovery call), agreed scope boundaries, pricing approach, and key differentiators.
Decision 4: Is workflow drifting?
Yes—the deliverables don't match client needs. Classic drift.
Action: Re-inject the client requirements document. "Review the discovery call notes before suggesting deliverables. Every deliverable must trace to a stated client need."
Scenario E: Development — Code Review Assistant
You're reviewing a large pull request. The AI has been running for 45 minutes, and quality is degrading—it's missing security issues it caught earlier.
Decision 1: Is context > 70%?
Run /context. Output: Context: 156,000 / 200,000 tokens (78%)
Yes. Compaction needed.
Decision 2: Is the task complete?
No—you're mid-review.
Action: /compact Focus on the PR review findings and files already analyzed. Discard the debugging tangent from messages 12-18.
Decision 3: Is this multi-session?
No—you'll finish this review today.
No action needed for progress files.
Decision 4: Is workflow drifting?
You started reviewing for "security issues" but Claude has shifted to "code style." That's drift.
Action: Re-state your goal explicitly. "Return focus to security issues. Code style is out of scope for this review."
The decision tree didn't tell you to use every technique. It told you which ones match your situation—regardless of your domain.
Context isn't free. Every token you add competes for attention with every other token. Understanding budget allocation helps you make tradeoffs.
The reserve buffer is critical. If you're running at 90% utilization, any file read might push you into degradation territory. Keep headroom.
These percentages shift throughout a session:
Early session (first 10 messages):
Mid session (messages 20-40):
Late session (messages 50+):
Notice how message history grows to dominate. This is why conversations degrade—the useful context (CLAUDE.md, tools) gets proportionally smaller as conversation noise accumulates.
When you're over budget, these strategies help you reclaim tokens without losing quality:
Before including a large document, summarize it:
When to use: Documents > 2,000 tokens that you need for reference but don't need verbatim.
Store large knowledge bases in a vector database. Retrieve only relevant chunks:
When to use: Reference materials, documentation, knowledge bases that are too large to include whole.
Not everything needs to live in context. Use external storage:
Pass IDs or references, not full content.
When to use: Data that changes, accumulates, or exceeds context limits.
Before including content, verify it's needed:
When to use: When you're including files "just in case" rather than because they're definitely needed.
System messages persist across turns. User messages accumulate in history. Structure accordingly:
When to use: When designing your context architecture from scratch.
Set guardrails and check regularly:
When to use: Always. Monitoring should be automatic, not an afterthought.
For tasks requiring more context than fits, process in rounds:
Each round uses fresh context. The final round operates on summaries, not raw data.
When to use: Analysis tasks requiring more input than context allows.
Techniques map to situations. Here's the quick reference:
Notice that problems often have multiple contributing causes. Here are examples across domains:
Legal: "Claude keeps missing liability clauses" might be:
Marketing: "Claude keeps going off-brand" might be:
Research: "Claude keeps misattributing findings" might be:
Consulting: "Claude keeps suggesting irrelevant deliverables" might be:
Start with the most likely cause. If it doesn't resolve, check the next.
How do you know if your Digital FTE is production-ready? Apply these four criteria.
The four quality criteria (Consistency, Persistence, Scalability, Knowledge) are your Digital FTE's performance review metrics. Score below 3/5 on any = not ready for client deployment.
The test: Does it give the same quality answer at turn 1 vs turn 50?
How to measure:
Scoring:
What affects it: Attention budget management, compaction timing, position sensitivity.
The test: Can it resume work after a 24-hour break?
How to measure:
Scoring:
What affects it: Progress file quality, decision documentation, session exit protocol.
The test: Can it handle 10-step tasks without drift?
How to measure:
Scoring:
What affects it: Memory injection, context isolation, explicit goal statements.
The test: Does it apply domain expertise automatically?
How to measure:
Domain examples:
Scoring:
What affects it: CLAUDE.md signal quality, memory extraction, tacit knowledge documentation.
Objective: Apply the full context engineering toolkit to build an agent worth showing to clients.
Duration: 120 minutes active work (may span multiple sessions)
Deliverable: A production-quality specialized agent with quality verification evidence.
Select an agent type that matches your expertise:
The domain should be specific enough that generic agents fail. Your agent should have an unfair advantage because it knows YOUR context—your firm's standards, your industry's terminology, your methodology.
Step 1: Audit Your CLAUDE.md
Apply the signal-to-noise audit from Lesson 4:
Step 2: Optimize for Position
Apply the three-zone strategy from Lesson 3:
Step 3: Establish Baseline
Run your test task. Record the output quality. This is your "before" measurement.
Step 1: Create Progress File Template
Based on your domain, create claude-progress.txt:
Step 2: Extract Tacit Knowledge
Apply the extraction protocol from Lesson 5:
Domain-specific examples:
Step 3: Document Decisions
As you make choices about your agent's behavior, record them:
Step 1: Define Your Memory Schema
What memories should your agent have? Define categories:
Step 2: Build Initial Memory Store
Create a memories file or vector DB. Examples by domain:
Step 3: Configure Memory Injection
Add to your CLAUDE.md:
For advanced implementation, set up PreToolUse hooks per Lesson 8.
Step 1: Consistency Test
Step 2: Persistence Test
Step 3: Scalability Test
Step 4: Knowledge Test
Your final deliverable is a folder containing:
The quality-assessment.md should look like:
This is your prototype for the Digital FTE manufacturing process you'll refine throughout the book. By the end of this lab, you'll have a production-quality Digital FTE that demonstrates the difference between "using AI" and "selling AI solutions."
Everything in this chapter supports Principle 5: "Persisting State in Files."
You now understand:
Without context engineering discipline, "persisting state in files" is cargo cult. With it, file-based state becomes a superpower.
In Chapter 6, you'll learn Principle 5 explicitly, along with the other six principles of general agent problem solving. This chapter gave you the physics. Chapter 6 gives you the practices.
This capstone lesson synthesized the entire chapter:
The difference between a $50/month chatbot and a $5,000/month Digital FTE is now concrete. It's not magic. It's discipline. Context engineering is the manufacturing quality control that makes your AI solutions worth buying.
What you're learning: Quality assessment requires evidence, not gut feeling. This prompt forces you to justify each score with domain-specific observations, building the habit of evidence-based evaluation. You're learning to see your agent the way a paying client in your industry would.
What you're learning: Technical quality must translate to business value. Your context engineering discipline is invisible to clients—a law firm sees "catches 95% of liability issues," not "memory injection prevents drift." This prompt trains you to articulate value in terms that matter to buyers in YOUR industry, which is essential for the Digital FTEs: Engineering thesis of building sellable Digital FTEs.
What you're learning: Continuous improvement requires domain-aware prioritization. Not every improvement matters equally in every field. This prompt trains you to think economically about quality investments—maximizing impact per hour invested for your specific industry. You're learning that quality manufacturing is iterative, not one-shot.
You've completed the context engineering discipline. Here's what you now know:
This isn't abstract theory. You've built a production-quality agent for YOUR domain. You've measured its quality against concrete criteria. You've created artifacts you can show to clients—whether they're law firms, marketing agencies, research institutions, or engineering teams.
The Digital FTEs: Engineering thesis is now operational: you know how to manufacture Digital FTEs with quality control. Whether your expertise is in contracts, campaigns, citations, or code—the discipline you've learned here applies to all of them.
Welcome to professional context engineering.