Cohere Rerank: when it changes everything and when it doesn't
Reranking improved our search from 'sort of related' to 'exactly what you asked.' Here is how the scores work and when to add it to your pipeline.
You search your vector database for "What is Shabbat?" and get 20 results. Some are about Shabbat prayers. Some are about Shabbat candles. Some are vaguely about rest and holiness but not really about Shabbat.
How do you pick the 5 best ones to send to the LLM?
Vector similarity gives you a rough ranking. But it scored each document independently - it never "read" the document and the query together. That is where reranking comes in.
What reranking does
A reranker (like Cohere Rerank) is a cross-encoder. It reads the query and each document together as a single input and scores how relevant the document is to the query, from 0 to 1.
Vector search (bi-encoder):
embed("What is Shabbat?") vs embed("Remember the Sabbath day...")
-> Compare two vectors independently. Fast but rough.
Reranking (cross-encoder):
model("What is Shabbat?" + "Remember the Sabbath day...")
-> Read both together. Slow but accurate.
Think of hiring:
- Vector search = reading 20 resumes in 5 seconds each (scan for keywords)
- Reranking = interviewing the 20 candidates for 1 minute each (actually understanding them)
The scores in practice
We tested Cohere Rerank on our Torah Study AI project (Sefaria texts indexed in Weaviate). Here are real scores:
Query: "What is the Shema prayer?"
| Rank | Score | Source |
|---|---|---|
| 1 | 0.833 | Siddur Ashkenaz, Blessings of the Shema |
| 2 | 0.641 | Siddur Ashkenaz, Shema 3 (Deuteronomy 6:4) |
| 3 | 0.490 | Siddur Ashkenaz, Six Remembrances |
| 4 | 0.202 | Siddur Ashkenaz, Post Service |
| 5 | 0.064 | Siddur Ashkenaz, Song of the Day |
The top result at 0.833 is exactly the right text. The last result at 0.064 is barely related.
Query: "Rashi on Deuteronomy 1:1" (text not in our database)
| Rank | Score | Source |
|---|---|---|
| 1 | 0.023 | Rashi on Deuteronomy 11:14 |
| 2 | 0.018 | Rashi on Deuteronomy 7:17 |
| 3 | 0.011 | Likutei Halakhot |
The top score is 0.023. Almost zero. The reranker is telling us: "I read these documents with the query, and none of them are about what you asked."
The relevance threshold
These scores give us a clean signal for decision-making:
RELEVANCE_THRESHOLD = 0.3
if top_score >= RELEVANCE_THRESHOLD:
# Good retrieval: generate answer from sources
generate_answer(sources)
else:
# Bad retrieval: fallback (suggest alternatives, link to source)
fallback_response()
- Score 0.833 (Shema prayer) -> well above 0.3 -> answer from sources
- Score 0.023 (Rashi not found) -> well below 0.3 -> helpful fallback
The threshold of 0.3 is a starting point. Tune it for your domain:
- Too low (0.1): you might answer from barely-relevant documents
- Too high (0.8): you trigger fallback too often, even on decent results
When reranking changes everything
Reranking is most valuable when:
-
Your queries are in a different language than your documents. Vector similarity might rank an English document about Shabbat below a Hebrew document about something else. The reranker reads both and understands the cross-language relevance.
-
You have many similar documents. If 15 of your 20 results are about Shabbat but only 3 are specifically about the prayer, the reranker finds those 3.
-
You need a confidence score for decision-making. Without reranking, you have vector distances (0.82, 0.79, 0.77...) which are relative and hard to threshold. Rerank scores (0.833, 0.023) have clear semantic meaning.
When reranking is not worth it
- Very small corpus (< 1,000 documents). Vector search alone is good enough.
- Exact match queries ("Show me Genesis 1:1"). Keyword search (BM25) handles this better.
- Latency-critical applications. Reranking adds 200-500ms per query.
The pipeline
The standard RAG pipeline with reranking:
Question
|
v
Vector DB returns 20 candidates (fast, rough)
|
v
Cohere Rerank keeps top 5 (slow, accurate)
|
v
Score >= 0.3?
| |
YES NO
| |
LLM generates Fallback
answer from response
top 5 sources
Each step keeps fewer documents but picks better ones. 20 -> 5 -> answer.
Cost
Cohere Rerank has a free tier: 1,000 rerank calls per month. Each call can rerank up to 1,000 documents.
For a Torah study chatbot with maybe 50-100 queries per day, the free tier is more than enough. At scale, it costs $1 per 1,000 searches.
The cost of reranking is almost always worth it. A bad answer from irrelevant sources wastes the user's time and erodes trust. A good answer from well-ranked sources builds it.