The Context Window Changes the Game
Claude 4.5 shipped with a 1M token context window that actually works. Not "works in benchmarks" — works in production with dense legal contracts and financial documents. We tested it against our LEXIS pipeline within hours of release.
Here's what we found: For documents under 200 pages, you can skip retrieval entirely. Just stuff the whole thing into context and ask your question. Accuracy on our legal benchmark hit 94.8% — only 1.5 points below our full RAG pipeline with re-ranking.
What This Means for RAG Architectures
This doesn't kill RAG. But it kills naive RAG for a huge swath of use cases. If your corpus fits in context, you don't need vector databases, chunking strategies, or embedding models. You need a good prompt and a credit card.
Where RAG still wins: multi-document queries across thousands of files, real-time knowledge that changes daily, and cost optimization at scale. Stuffing 1M tokens per query is expensive — roughly $15 per call at current pricing. Our RAG pipeline handles the same query for $0.03.
The New Playbook
We're restructuring our recommendation framework. For clients with < 500 documents, we're now suggesting a context-stuffing-first approach with RAG as a cost optimization layer. For enterprise clients with tens of thousands of documents, RAG remains the only viable path.
The teams that will win are the ones that treat context windows and retrieval as complementary tools, not competing ideologies. Claude 4.5 didn't kill RAG — it gave us a bigger toolkit.