Imagine an employee who gets reminded of relevant company policies exactly when making decisions, not just at the start of the day.
At 9am, you brief them: "Here are our quality standards, client preferences, and the project scope." Perfect. By 3pm, they're making a critical decision—but the morning briefing has faded. They remember the general direction but not the specific constraint that matters right now.
Now imagine a different employee. Every time they're about to make a decision, a helpful colleague appears: "Hey, remember the Johnson incident? Here's what you need to know for this exact situation." That's memory injection.
You start a complex task with a clear goal. Your AI assistant knows your preferences, your project conventions, your stakeholder requirements. Turn 1 goes perfectly. By turn 20, the AI has forgotten half of what made turn 1 successful.
What happened?
The memories you injected at the beginning are still in context. They haven't disappeared. But they're now buried under 19 turns of conversation, research outputs, and evolving requirements. The AI's attention budget (Lesson 2) means those turn 1 memories are receiving only a fraction of the processing power they need. More importantly, those memories were selected for turn 1's intent. Turn 20 has different intent. The memories that would help turn 20 are sitting unused in your memory store.
This is workflow drift. And fixing it requires a fundamental shift in when and how you inject context.
Semantic memory injection typically happens at prompt submission. You type a request. A hook queries your memory store for relevant context. The results are injected into the AI's context. Then the AI processes your request.
This works well for single-turn interactions. But multi-turn workflows create a problem:
Consider a legal professional reviewing a complex contract:
Turn 1: You ask the AI to review vendor agreement terms. The memory hook finds relevant memories about your firm's standard terms, this client's risk tolerance, and contract negotiation history. Perfect match.
Turn 5: The AI has identified a problematic indemnification clause. It's now focused on liability allocation. Your general contract memories are still there, but they're less relevant than memories about indemnification precedents and insurance requirements.
Turn 12: The AI is drafting alternative language for the dispute resolution section. It needs memories about this client's arbitration preferences and past dispute history. The indemnification memories from turn 5 are noise.
Turn 20: The AI is preparing a summary memo for the partner. The drafting memories from turn 12 are now irrelevant. Client communication preferences would help, but they were never injected.
Or consider a marketing strategist developing a campaign:
Turn 1: You ask the AI to develop Q4 campaign strategy. The memory hook finds brand voice guidelines, audience demographics, and budget parameters. Perfect match.
Turn 5: The AI has pivoted to channel strategy. It's evaluating media mix options. Your brand voice memories are still there, but they're less relevant than memories about channel performance history and media costs.
Turn 12: The AI is writing creative briefs for each channel. It needs memories about past creative that worked with this audience, design constraints, and production timelines. The channel strategy memories from turn 5 are noise.
Turn 20: The AI is forecasting campaign ROI for leadership approval. The creative brief memories from turn 12 are now irrelevant. CFO preferences for financial presentations would help, but they were never injected.
Each turn, the AI's actual needs drift further from the context you provided at the start. The memories you injected were correct for turn 1. They're wrong for turn 20.
Claude Code's hook system offers two points where you can inject context:
UserPromptSubmit happens once per user message. It's synchronous with your input. The memories it injects reflect what you asked for at that moment.
PreToolUse happens potentially many times per user message. Each time the AI is about to use a tool—reading a document, searching files, editing content—this hook fires. That means you get multiple opportunities to inject relevant context throughout the workflow.
The key insight: The AI's thinking evolves during the reasoning process. By turn 20, the AI's thinking block contains intent and reasoning about what it's about to do next. That thinking is the perfect query for semantic memory.
Here's the flow:
Important implementation detail: The PreToolUse hook does NOT receive thinking blocks directly. It receives a transcript_path input pointing to a JSONL file on disk. Your hook must:
Why this works: The AI's thinking block contains what it is about to do and why. When you embed that thinking and search for similar memories, you find memories that are relevant to the current action, not the original prompt.
Legal example: Turn 20's thinking might be: "I need to summarize the key risks in this agreement for the partner review memo. The main concerns are the indemnification carve-outs and the ambiguous termination provisions. I should highlight these in language appropriate for partner consumption."
Embedding that thinking and searching your memory store finds:
These memories are exactly what turn 20 needs. They would never have been selected at turn 1 when the thinking was about reviewing contract terms.
Marketing example: Turn 20's thinking might be: "I need to forecast ROI for leadership approval. The campaign costs total $450K across channels. I should present this using the format the CFO prefers."
Embedding that thinking finds:
Again—exactly what turn 20 needs, not what turn 1 needed.
The AI's thinking isn't just internal monologue. It's structured reasoning that reveals:
This makes thinking blocks the ideal query for semantic search. They contain the dense, specific context about what the AI needs to know.
Compare:
User prompt (turn 1): "Review the vendor agreement"
Thinking block (turn 20): "I'm preparing the partner memo summarizing key contract risks. The main issues are the indemnification carve-outs in Section 7.2 and the ambiguous termination language in Section 12.1. I should use the format the partner prefers for risk summaries."
The specificity of thinking blocks produces more relevant memory retrievals.
A memory injection system is only as good as its memories. What should you store?
High-value memories across domains:
Lower-value memories (across all domains):
Memory structure example (Legal):
Memory structure example (Marketing):
Memory structure example (Research):
Start with 10-20 memories. You don't need a massive corpus. A focused collection of genuinely useful memories outperforms a large collection of noise.
PreToolUse doesn't replace UserPromptSubmit. They serve different purposes:
A robust system uses both:
This layered approach provides both stability (consistent baseline) and adaptability (evolving relevance).
Objective: Understand how workflow drift affects your work and design a memory corpus that maintains relevance throughout multi-step tasks.
Duration: 90 minutes
Choose your path:
Step 1: Map Your Own Workflow Drift
Think about a recent complex task you worked on with AI assistance. This could be:
Write down the major turns in that workflow:
Step 2: Identify the Context Gaps
For each turn, answer:
Step 3: Quantify the Drift
Create a simple drift score for your workflow:
This exercise makes workflow drift concrete and visible.
Step 1: Brainstorm High-Value Memories
Based on your workflow analysis, list 15-20 memories that would have helped at different points. Don't worry about format yet—just capture the knowledge.
Step 2: Categorize by Relevance Pattern
Group your memories:
Always Relevant (inject at UserPromptSubmit):
Situationally Relevant (inject at PreToolUse):
Step 3: Structure 5-7 Key Memories
Pick your highest-value situationally relevant memories and structure them:
Deliverable (Conceptual Track): A memory corpus document with 5-7 well-structured memories for your domain.
For those comfortable with Python, here's a minimal memory injection hook. No vector databases or embeddings—just read a markdown file and inject it.
Create memories.md in your project:
Create memory_hook.py:
Add to your .claude/settings.json:
Run any Claude Code session. Before each tool use, your memories get injected into context.
That's it. ~25 lines of Python. No dependencies beyond the standard library.
Want smarter injection? Here are paths to explore:
The simple version works. Start there. Add complexity only when you need it.
Conceptual Understanding (All Participants):
Technical Implementation (Optional):
Problem: Memories too generic
Solution: Make memories specific. Instead of "communicate clearly with clients," write "Acme Corp's GC requires dollar ranges in risk summaries—never use vague terms like 'significant exposure.'"
Problem: Too much context injected
Solution: Keep your memory file focused. If it's over 500 words, you're probably injecting noise. Split into multiple files and inject selectively.
Problem: Hook not firing
Solution: Check that your hook is registered in .claude/settings.json and the script path is correct. Test the script standalone first: echo '{}' | python3 memory_hook.py
What you're learning: Workflow drift is invisible until you examine it explicitly. This prompt makes the drift concrete by tracking context needs across steps in YOUR domain. You're learning to anticipate where prompt-time injection will fail.
What you're learning: Memory corpus design is intentional, not exhaustive. This prompt trains you to identify high-value memories for YOUR domain. You're learning the difference between memories that genuinely help versus documentation that happens to exist.
What you're learning: Not all context needs the same injection strategy. Some context is stable across the entire session. Some evolves with the workflow. This prompt builds intuition for matching context to injection timing in your professional domain.