Version 1: The Overconfident Prototype
Our first LEXIS prototype was embarrassingly naive. Take a contract, chunk it, embed it, retrieve relevant sections, ask Claude to identify risks. Demo looked amazing. Client loved it. Then we tested it against real associate work product.
Accuracy: 61%. The system missed indemnification caps buried in cross-referenced schedules, misidentified standard boilerplate as high-risk, and hallucinated obligations that didn't exist in the contract. A 61% accurate legal review tool isn't just useless — it's dangerous.
Version 2: The Over-Engineered Monster
We overcorrected. Built a 14-stage pipeline with separate models for clause classification, entity extraction, obligation mapping, risk scoring, cross-reference resolution, and output validation. Each stage was individually impressive.
The problem: End-to-end latency was 47 minutes per contract. A senior associate takes about 4 hours. We'd achieved a 5x speedup at the cost of $180 per contract in API calls. The client's associates cost $75/hour. We were more expensive than the humans.
Version 3: The Right Architecture
We stripped it back. Three stages: intelligent chunking (clause-level, not paragraph-level), targeted retrieval with a custom re-ranker trained on legal risk patterns, and a single Claude Opus call with a carefully engineered prompt that included the top-k retrieved clauses plus contract metadata.
Accuracy: 94.1%. Latency: 90 seconds per contract. Cost: $2.30 per contract. We were getting close.
Version 4: Production
Added confidence scoring, human-in-the-loop for low-confidence flags, evaluation pipelines running on every deploy, and feedback loops from associate corrections. Accuracy climbed to 96.3%.
The 160x speed claim: 90 seconds vs 4 hours = 160x. But the real win isn't speed — it's consistency. Our system doesn't get tired at 11 PM, doesn't miss things because it's reviewing its fifth contract of the day, and flags the same risk the same way every time.
The lesson: start simple, measure ruthlessly, and don't confuse pipeline complexity with pipeline quality.