The Signal — AI News & Takes

The False Binary

Twitter loves a hot take: "RAG is dead, fine-tuning is all you need" or "Fine-tuning is a waste, just use RAG." Both positions are wrong, and both reveal that the person hasn't shipped a production system recently.

Here's the actual decision framework we use with clients:

Use RAG when: Your knowledge base changes frequently (weekly or faster), you need attributable/cited responses, your data is too large to fit in training examples, or you need to control access to specific documents per user.

Use fine-tuning when: You need the model to learn a specific output format or style, you have domain-specific reasoning patterns that prompting can't capture, latency is critical and you want to avoid retrieval overhead, or your base model consistently misunderstands domain terminology.

The Real Answer: Use Both

Our highest-performing production systems combine both approaches. For LEXIS AI, we fine-tuned a model on 5,000 examples of legal risk analysis to teach it how lawyers reason about contract clauses. Then we use RAG to feed it the actual contract content at inference time.

The fine-tuned model doesn't know your specific contract — that's what RAG provides. But it knows how to analyze legal language at a level that vanilla Claude doesn't match. Fine-tuning teaches the how. RAG provides the what.

The Cost Reality

Fine-tuning isn't free. Our legal model cost ~$800 to train and needs retraining quarterly as we accumulate correction data. But it reduced our prompt token count by 40% (because we no longer need extensive few-shot examples) and improved accuracy by 6 points.

Over 50K queries/month, that 40% prompt reduction saves ~$1,100/month in API costs. The fine-tuning pays for itself in under a month.

Stop asking "RAG or fine-tuning?" Start asking "What does my production system actually need?"

RAGFine-TuningArchitectureProductionLLM

The RAG vs Fine-Tuning Debate Is Over. Here's Why.

The False Binary

The Real Answer: Use Both

The Cost Reality

Subscribe to The Signal

More from The Signal

Vector Databases Are Overengineered for 90% of Use Cases

Why We Stopped Using LangChain (And What We Use Now)

Claude 4.5 Changes Everything for Enterprise RAG