How Leading AI Teams Are Engineering Context at Scale
/ From Theory to Practice
Introduction
While Drew Breuning's presentation established the philosophical and linguistic foundations of Context Engineering, Lance Martin's follow-up talk at LangChain HQ dove into the methods.
Where Drew asked "what is Context Engineering and why does it matter?", Lance answered "how are the world's leading AI teams actually doing it?"
Lance's presentation revealed the battle-tested techniques that companies like Anthropic, Manus, and Cognition are using to build production AI systems that can handle hundreds of tool calls without degrading into chaos. It's the difference between understanding why context matters and knowing exactly how to manage it when your agent is 50 tool calls deep and still needs to maintain coherent reasoning.
The Context Explosion: Why Agents Break Everything
Lance opened with a reality check that validated what Drew had theorized: agents fundamentally change the context game. While Drew introduced us to the concept of context collapse, Lance showed us the brutal math:
- Traditional chatbots: Linear growth in context from user messages
- Modern agents: Exponential growth from tool observations compounding with each interaction
- The Manus benchmark: 50 tool calls for a typical request
- The Anthropic finding: Production agents routinely hit hundreds of conversation turns
This isn't just about longer conversations—it's about a fundamentally different information architecture. As Lance put it, "agents not only receive prompts from the user, they also receive observation from tool calls." Every tool response adds to the context burden, creating what he perfectly described as context that "blows up considerably."
When Context Fails: Production Failure Modes
Building on Drew's theoretical framework, Lance presented specific failure modes that teams have documented in production:
Context Poisoning
The Gemini Pokemon example wasn't just amusing—it represented a systemic failure where hallucinated information contaminates future reasoning. Once an agent invents a fact, it treats that invention as truth in all subsequent operations.
Context Distraction
When Gemini hit 100k tokens, it didn't just slow down—it fundamentally changed behavior, favoring repetition over innovation. The agent got "stuck" in behavioral loops, unable to break free from established patterns.
Context Confusion
More tools don't always mean better performance. When agents have access to similar tools, they struggle with selection, leading to decreased performance even when theoretically more capable.
Context Clash
Perhaps most insidiously, when sequential tool calls return contradictory information, agents don't gracefully handle the conflict—they degrade unpredictably.
The Practitioner's Toolkit: Five Battle-Tested Techniques
Lance presented a framework of practical solutions that he's observed:
1. Offloading: The Universal Solution
Why it's the favorite: Nearly every team uses offloading because it's simple, effective, and doesn't risk information loss.
Lance highlighted several patterns:
- Manus's todo.md approach: Continuously updated files that track agent state
- Research brief pattern: Generate planning documents early, store them externally, reload them when needed
- Long-term memory files: User preferences and historical data live outside the active context
The key insight: "If you've done brief generation and spurred the message history and you have 100,000 tokens of research, the agent may or may not remember that original plan if it's buried at the top of your context window."
2. Reducing: Handle with Extreme Care
The community is split on reduction techniques:
Proponents like:
- Anthropic's auto-compaction at 95% capacity
- Tool call summarization at boundaries
- Agent-to-agent handoff compression
Skeptics warn:
- Manus explicitly discourages reduction due to information loss
- Cognition uses specialized fine-tuned models just for summarization
- The risk of losing critical details often outweighs benefits
Lance's take: "Be very careful about information loss when you're doing this."
3. Retrieving: Beyond Basic RAG
The presentation revealed sophisticated retrieval systems in production:
- Windsurf: Mixed retrieval methods with re-ranking
- Cursor's Preempt: An entire system dedicated to assembling retrievals into prompts
- Tool retrieval: Dynamically selecting relevant tools based on the current task
This isn't your grandmother's RAG—it's sophisticated, multi-layered retrieval orchestration.
4. Isolating: The Multi-Agent Minefield
Here's where the rubber meets the road on multi-agent systems:
The promise: Parallel processing, specialized contexts, avoided contamination
The peril: Conflicting decisions, coordination nightmares, inconsistent outputs
Lance's critical insight from their open-deep-research system: "We only do context gathering in the sub-agents. We don't actually write sections of the report." This prevents the common failure mode where different agents write conflicting content.
5. Caching: The Cost Optimizer
A Manus innovation that others haven't widely adopted yet:
- Cache immutable content (system prompts, tool descriptions)
- 10x cost reduction for cached tokens on Claude
- Doesn't solve length problems but dramatically improves economics
The Production Reality Check
Lance presented a comparative analysis that revealed fascinating disagreements in the field:
Technique | Consensus Level | Key Insight |
---|---|---|
Offloading | Universal adoption | The safest, most reliable technique |
Reducing | Highly controversial | Manus says never, Cognition says maybe with fine-tuning |
Retrieving | Standard practice | Everyone does it, but implementation varies wildly |
Isolating | Philosophical divide | Cognition warns against it, Anthropic embraces it |
Caching | Emerging practice | Only Manus discussed it publicly |
Case Study: Open-Deep-Research in Action
Lance's team's research system provided a masterclass in combining techniques:
- Offloading: Research briefs stored in LangGraph state
- Reduction: Careful summarization of tool outputs
- Isolation: Sub-agents for parallel research, but centralized writing
The critical design decision: "Sub-agents lower risk if avoid decisions." They can gather information in parallel, but only one agent makes synthesis decisions.
The Uncomfortable Truth About Multi-Agent Systems
One of the most valuable parts of Lance's presentation was his honest assessment of multi-agent architectures:
The dream: Specialized agents working in harmony
The reality: "Multi-agents can't coordinate very well and so they can make conflicting decisions"
His pragmatic solution: Use multi-agent systems for information gathering only. Let them explore in parallel, but centralize all decision-making and synthesis.
Beyond the Hype: What Actually Works
Lance concluded with a simple summary:
- Most popular: Offloading (because it just works)
- Most controversial: Reduction (information loss is real)
- Most sophisticated: Retrieval systems (but implementation quality varies)
- Most dangerous: Uncoordinated multi-agent systems
- Most underutilized: Caching (huge cost savings waiting to be captured)
The Road Ahead
Lance's parting thought reinforced Drew's vision while adding practical urgency: "Context engineering is just one small piece of an emerging thick layer of non-trivial software that coordinates LLM calls into full LLM apps."
The term "ChatGPT wrapper" isn't just wrong—it's dangerously misleading. What teams are building requires sophisticated orchestration, careful information architecture, and yes, context engineering.
Key Takeaways for Practitioners
- Start with offloading: It's the safest, most universally applicable technique
- Be extremely careful with reduction: Information loss is real and often catastrophic
- Invest in retrieval infrastructure: This is where competitive advantage lives
- Design multi-agent systems for gathering, not deciding: Centralize synthesis
- Don't ignore caching: The cost savings are too significant to pass up
Conclusion: From Philosophy to Production
Where Drew gave us the language and framework to think about Context Engineering, Lance showed us what it looks like in the trenches. The techniques presented here aren't theoretical—they're extracted from systems handling millions of tokens and hundreds of tool calls in production today.
The message is clear: Context Engineering isn't just a useful concept—it's an essential discipline for anyone building serious AI applications. The teams that master these techniques will build the agents that actually work. The ones that don't will keep wondering why their demos break in production.
As we move from the "ChatGPT wrapper" era to the age of sophisticated AI applications, Context Engineering will separate the demos from the deployments. Lance's presentation didn't just validate Drew's vision—it gave us the blueprints to build it.
Resources for Deeper Exploration
- Open-Deep-Research: Complete implementation of these patterns: github.com/langchain-ai/open-deep-research
- Context Engineering Examples: github.com/langchain-ai/how_to_fix_your_context
- The Manus Architecture: Detailed write-up on their 50-tool-call system
- Anthropic's Multi-Agent Patterns: Reference implementations for agent coordination
The future isn't just about better prompts—it's about better context. And now we know how to build it.
This article covers Lance Martin's practical presentation on Context Engineering, delivered at LangChain HQ in San Francisco on July 23rd. As one of the early engineers at LangChain and a core contributor to their Python open source library, Lance brings firsthand experience from building the tools and systems that thousands of developers use to implement context engineering in production.