The Abstraction Tax
LangChain is great for prototyping. You can wire together a RAG pipeline in 30 lines of code and demo it to a client by Friday. The problem starts when that prototype becomes production code — and someone has to debug why your chain is silently swallowing errors at 3 AM.
We hit three breaking points:
- Version instability. LangChain's API surface changed so frequently that upgrading minor versions broke production pipelines. We spent more time fixing LangChain compatibility issues than building features.
- Opaque abstractions. When a retrieval chain fails, you get a stack trace 40 frames deep through LangChain internals. Good luck figuring out whether the issue is in your embeddings, your retriever, or LangChain's internal state management.
- Performance overhead. LangChain adds measurable latency through its chain-of-responsibility pattern. For our legal AI product, this added 200-400ms per query — unacceptable when lawyers are waiting.
What We Use Now
We built thin, typed wrappers around the APIs we actually use. Our stack:
- Direct SDK calls to Anthropic/OpenAI with typed request/response wrappers
- Custom retrieval layer using Pinecone's SDK directly with our own re-ranking logic
- Pydantic models for every input/output boundary
- Structured logging at every step — not buried inside a framework
Total code for our "framework"? About 800 lines of Python. We can read every line. We can debug every failure. And our p95 latency dropped by 35%.
The Lesson
Frameworks are seductive because they promise you won't have to think about the boring parts. But in AI engineering, the boring parts — error handling, retry logic, token counting, rate limiting — are where production systems live or die. Own your stack.