Cohere Rerank: when it changes everything and when it doesn't

You search your vector database for "What is Shabbat?" and get 20 results. Some are about Shabbat prayers. Some are about Shabbat candles. Some are vaguely about rest and holiness but not really about Shabbat.

How do you pick the 5 best ones to send to the LLM?

Vector similarity gives you a rough ranking. But it scored each document independently - it never "read" the document and the query together. That is where reranking comes in.

What reranking does

A reranker (like Cohere Rerank) is a cross-encoder. It reads the query and each document together as a single input and scores how relevant the document is to the query, from 0 to 1.

Vector search (bi-encoder):
  embed("What is Shabbat?") vs embed("Remember the Sabbath day...")
  -> Compare two vectors independently. Fast but rough.

Reranking (cross-encoder):
  model("What is Shabbat?" + "Remember the Sabbath day...")
  -> Read both together. Slow but accurate.

Think of hiring:

Vector search = reading 20 resumes in 5 seconds each (scan for keywords)
Reranking = interviewing the 20 candidates for 1 minute each (actually understanding them)

The scores in practice

We tested Cohere Rerank on our Torah Study AI project (Sefaria texts indexed in Weaviate). Here are real scores:

Query: "What is the Shema prayer?"

Rank	Score	Source
1	0.833	Siddur Ashkenaz, Blessings of the Shema
2	0.641	Siddur Ashkenaz, Shema 3 (Deuteronomy 6:4)
3	0.490	Siddur Ashkenaz, Six Remembrances
4	0.202	Siddur Ashkenaz, Post Service
5	0.064	Siddur Ashkenaz, Song of the Day

The top result at 0.833 is exactly the right text. The last result at 0.064 is barely related.

Query: "Rashi on Deuteronomy 1:1" (text not in our database)

Rank	Score	Source
1	0.023	Rashi on Deuteronomy 11:14
2	0.018	Rashi on Deuteronomy 7:17
3	0.011	Likutei Halakhot

The top score is 0.023. Almost zero. The reranker is telling us: "I read these documents with the query, and none of them are about what you asked."

The relevance threshold

These scores give us a clean signal for decision-making:

RELEVANCE_THRESHOLD = 0.3

if top_score >= RELEVANCE_THRESHOLD:
    # Good retrieval: generate answer from sources
    generate_answer(sources)
else:
    # Bad retrieval: fallback (suggest alternatives, link to source)
    fallback_response()

Score 0.833 (Shema prayer) -> well above 0.3 -> answer from sources
Score 0.023 (Rashi not found) -> well below 0.3 -> helpful fallback

The threshold of 0.3 is a starting point. Tune it for your domain:

Too low (0.1): you might answer from barely-relevant documents
Too high (0.8): you trigger fallback too often, even on decent results

When reranking changes everything

Reranking is most valuable when:

Your queries are in a different language than your documents. Vector similarity might rank an English document about Shabbat below a Hebrew document about something else. The reranker reads both and understands the cross-language relevance.
You have many similar documents. If 15 of your 20 results are about Shabbat but only 3 are specifically about the prayer, the reranker finds those 3.
You need a confidence score for decision-making. Without reranking, you have vector distances (0.82, 0.79, 0.77...) which are relative and hard to threshold. Rerank scores (0.833, 0.023) have clear semantic meaning.

When reranking is not worth it

Very small corpus (< 1,000 documents). Vector search alone is good enough.
Exact match queries ("Show me Genesis 1:1"). Keyword search (BM25) handles this better.
Latency-critical applications. Reranking adds 200-500ms per query.

The pipeline

The standard RAG pipeline with reranking:

Question
    |
    v
Vector DB returns 20 candidates     (fast, rough)
    |
    v
Cohere Rerank keeps top 5           (slow, accurate)
    |
    v
Score >= 0.3?
    |           |
   YES          NO
    |           |
LLM generates   Fallback
answer from      response
top 5 sources

Each step keeps fewer documents but picks better ones. 20 -> 5 -> answer.

Cost

Cohere Rerank has a free tier: 1,000 rerank calls per month. Each call can rerank up to 1,000 documents.

For a Torah study chatbot with maybe 50-100 queries per day, the free tier is more than enough. At scale, it costs $1 per 1,000 searches.

The cost of reranking is almost always worth it. A bad answer from irrelevant sources wastes the user's time and erodes trust. A good answer from well-ranked sources builds it.