Why Reranking Matters
Question
We already have hybrid search results. Why add a reranking step?
Explanation
The problem with search results
Hybrid search is fast but imprecise. It returns 20 candidates, and some aren't truly relevant. The search scored each chunk on its own - it embedded the chunk, embedded the query, and compared vectors. It never actually "read" the chunk and query together.
What reranking does
A reranker is a cross-encoder: it reads the query AND each document together as a single input, then scores relevance from 0 to 1.
- Search (bi-encoder - encodes query and chunk separately): embed(query) vs embed(chunk) - fast, rough
- Reranking (cross-encoder - reads both together): model(query + chunk) - slow, accurate
Think of it like hiring:
- Search = reading 20 CVs in 5 seconds each (skim for keywords)
- Reranking = interviewing the 20 candidates for 1 minute each (actually understanding them)
Why not just rerank everything?
It's slow. Reranking 800 chunks would take too long. So the pipeline is:
804 chunks - hybrid search - 20 candidates - rerank - 5 best - LLM
Each step keeps fewer chunks but picks better ones.
Market leaders
- Cohere rerank-english-v3.0 - top tier (what we use)
- Jina jina-reranker-v2 - very good, free tier
- Voyage rerank-2 - very good (owned by Anthropic)
- BGE-reranker-v2 (local) - good, runs on your machine
Example
compressor = CohereRerank(model="rerank-english-v3.0", top_n=5)
reranked = compressor.compress_documents(documents, query)
# Each doc now has a relevance_score in its metadata
Our baseline test returned scores of 0.997, 0.994, 0.984, 0.978, 0.963 - very high confidence.