Skillia
← Back to articles

Cohere Rerank: when it changes everything and when it doesn't

Reranking improved our search from 'sort of related' to 'exactly what you asked.' Here is how the scores work and when to add it to your pipeline.

You search your vector database for "What is Shabbat?" and get 20 results. Some are about Shabbat prayers. Some are about Shabbat candles. Some are vaguely about rest and holiness but not really about Shabbat.

How do you pick the 5 best ones to send to the LLM?

Vector similarity gives you a rough ranking. But it scored each document independently - it never "read" the document and the query together. That is where reranking comes in.


What reranking does

A reranker (like Cohere Rerank) is a cross-encoder. It reads the query and each document together as a single input and scores how relevant the document is to the query, from 0 to 1.

Vector search (bi-encoder):
  embed("What is Shabbat?") vs embed("Remember the Sabbath day...")
  -> Compare two vectors independently. Fast but rough.

Reranking (cross-encoder):
  model("What is Shabbat?" + "Remember the Sabbath day...")
  -> Read both together. Slow but accurate.

Think of hiring:

  • Vector search = reading 20 resumes in 5 seconds each (scan for keywords)
  • Reranking = interviewing the 20 candidates for 1 minute each (actually understanding them)

The scores in practice

We tested Cohere Rerank on our Torah Study AI project (Sefaria texts indexed in Weaviate). Here are real scores:

Query: "What is the Shema prayer?"

RankScoreSource
10.833Siddur Ashkenaz, Blessings of the Shema
20.641Siddur Ashkenaz, Shema 3 (Deuteronomy 6:4)
30.490Siddur Ashkenaz, Six Remembrances
40.202Siddur Ashkenaz, Post Service
50.064Siddur Ashkenaz, Song of the Day

The top result at 0.833 is exactly the right text. The last result at 0.064 is barely related.

Query: "Rashi on Deuteronomy 1:1" (text not in our database)

RankScoreSource
10.023Rashi on Deuteronomy 11:14
20.018Rashi on Deuteronomy 7:17
30.011Likutei Halakhot

The top score is 0.023. Almost zero. The reranker is telling us: "I read these documents with the query, and none of them are about what you asked."


The relevance threshold

These scores give us a clean signal for decision-making:

RELEVANCE_THRESHOLD = 0.3

if top_score >= RELEVANCE_THRESHOLD:
    # Good retrieval: generate answer from sources
    generate_answer(sources)
else:
    # Bad retrieval: fallback (suggest alternatives, link to source)
    fallback_response()
  • Score 0.833 (Shema prayer) -> well above 0.3 -> answer from sources
  • Score 0.023 (Rashi not found) -> well below 0.3 -> helpful fallback

The threshold of 0.3 is a starting point. Tune it for your domain:

  • Too low (0.1): you might answer from barely-relevant documents
  • Too high (0.8): you trigger fallback too often, even on decent results

When reranking changes everything

Reranking is most valuable when:

  1. Your queries are in a different language than your documents. Vector similarity might rank an English document about Shabbat below a Hebrew document about something else. The reranker reads both and understands the cross-language relevance.

  2. You have many similar documents. If 15 of your 20 results are about Shabbat but only 3 are specifically about the prayer, the reranker finds those 3.

  3. You need a confidence score for decision-making. Without reranking, you have vector distances (0.82, 0.79, 0.77...) which are relative and hard to threshold. Rerank scores (0.833, 0.023) have clear semantic meaning.


When reranking is not worth it

  1. Very small corpus (< 1,000 documents). Vector search alone is good enough.
  2. Exact match queries ("Show me Genesis 1:1"). Keyword search (BM25) handles this better.
  3. Latency-critical applications. Reranking adds 200-500ms per query.

The pipeline

The standard RAG pipeline with reranking:

Question
    |
    v
Vector DB returns 20 candidates     (fast, rough)
    |
    v
Cohere Rerank keeps top 5           (slow, accurate)
    |
    v
Score >= 0.3?
    |           |
   YES          NO
    |           |
LLM generates   Fallback
answer from      response
top 5 sources

Each step keeps fewer documents but picks better ones. 20 -> 5 -> answer.


Cost

Cohere Rerank has a free tier: 1,000 rerank calls per month. Each call can rerank up to 1,000 documents.

For a Torah study chatbot with maybe 50-100 queries per day, the free tier is more than enough. At scale, it costs $1 per 1,000 searches.

The cost of reranking is almost always worth it. A bad answer from irrelevant sources wastes the user's time and erodes trust. A good answer from well-ranked sources builds it.