Skillia
Back to Learning Notes
rag

Why Reranking Matters

What reranking does, how cross-encoders work, and why it dramatically improves RAG quality

2 min read

Why Reranking Matters

Question

We already have hybrid search results. Why add a reranking step?

Explanation

The problem with search results

Hybrid search is fast but imprecise. It returns 20 candidates, and some aren't truly relevant. The search scored each chunk on its own - it embedded the chunk, embedded the query, and compared vectors. It never actually "read" the chunk and query together.

What reranking does

A reranker is a cross-encoder: it reads the query AND each document together as a single input, then scores relevance from 0 to 1.

  • Search (bi-encoder - encodes query and chunk separately): embed(query) vs embed(chunk) - fast, rough
  • Reranking (cross-encoder - reads both together): model(query + chunk) - slow, accurate

Think of it like hiring:

  • Search = reading 20 CVs in 5 seconds each (skim for keywords)
  • Reranking = interviewing the 20 candidates for 1 minute each (actually understanding them)

Why not just rerank everything?

It's slow. Reranking 800 chunks would take too long. So the pipeline is:

804 chunks - hybrid search - 20 candidates - rerank - 5 best - LLM

Each step keeps fewer chunks but picks better ones.

Market leaders

  • Cohere rerank-english-v3.0 - top tier (what we use)
  • Jina jina-reranker-v2 - very good, free tier
  • Voyage rerank-2 - very good (owned by Anthropic)
  • BGE-reranker-v2 (local) - good, runs on your machine

Example

compressor = CohereRerank(model="rerank-english-v3.0", top_n=5)
reranked = compressor.compress_documents(documents, query)
# Each doc now has a relevance_score in its metadata

Our baseline test returned scores of 0.997, 0.994, 0.984, 0.978, 0.963 - very high confidence.