Why Reranking Matters

Question

We already have hybrid search results. Why add a reranking step?

Explanation

The problem with search results

Hybrid search is fast but imprecise. It returns 20 candidates, and some aren't truly relevant. The search scored each chunk on its own - it embedded the chunk, embedded the query, and compared vectors. It never actually "read" the chunk and query together.

What reranking does

A reranker is a cross-encoder: it reads the query AND each document together as a single input, then scores relevance from 0 to 1.

Search (bi-encoder - encodes query and chunk separately): embed(query) vs embed(chunk) - fast, rough
Reranking (cross-encoder - reads both together): model(query + chunk) - slow, accurate

Think of it like hiring:

Search = reading 20 CVs in 5 seconds each (skim for keywords)
Reranking = interviewing the 20 candidates for 1 minute each (actually understanding them)

Why not just rerank everything?

It's slow. Reranking 800 chunks would take too long. So the pipeline is:

804 chunks - hybrid search - 20 candidates - rerank - 5 best - LLM

Each step keeps fewer chunks but picks better ones.

Market leaders

Cohere rerank-english-v3.0 - top tier (what we use)
Jina jina-reranker-v2 - very good, free tier
Voyage rerank-2 - very good (owned by Anthropic)
BGE-reranker-v2 (local) - good, runs on your machine

Example

compressor = CohereRerank(model="rerank-english-v3.0", top_n=5)
reranked = compressor.compress_documents(documents, query)
# Each doc now has a relevance_score in its metadata

Our baseline test returned scores of 0.997, 0.994, 0.984, 0.978, 0.963 - very high confidence.