BM25 vs Vector Search

Question

What's the difference between BM25 keyword search and vector (semantic) search?

Explanation

Two completely different ways to find relevant text.

BM25 (keyword search)

Like a classic search engine. Looks for the exact words you typed.

For the query "RAG pipeline", BM25 scores each chunk by:

Does "RAG" appear? +points
Does "pipeline" appear? +points
Is "pipeline" a rare word across all chunks? Rare words are worth more points (this is called IDF - the rarer the word, the higher the score)
How many times does it appear? More = more points, but with a ceiling
Is the chunk short or long? Same word in a short chunk = more significant

BM stands for Best Matching. 25 is the 25th iteration of the formula (from 1994, still used everywhere).

Vector search (semantic)

Converts both the query and the chunks into vectors (lists of numbers), then finds chunks whose vectors are closest to the query vector. Understands meaning, not just words.

Key differences

BM25 looks for exact words. "RAG pipeline" finds chunks containing those words. Strength: precise for known terms, acronyms, names. Weakness: misses synonyms.
Vector looks for meaning. "RAG pipeline" finds chunks about retrieval-augmented generation even without those words. Strength: understands synonyms and different ways of saying the same thing. Weakness: can return vaguely related results.
BM25 is very fast and needs no embeddings.
Vector is slower (vector math) and requires an embedding model.

Why not just use one?

Neither is perfect alone. BM25 catches exact matches that vectors miss. Vectors catch semantic matches that BM25 misses. Together (hybrid search) they cover each other's blind spots.

Example

Weaviate runs both in parallel with alpha=0.5 (50/50 balance), then merges the results into a single ranked list.