Vector Stores
Specialized database to store embeddings (vectors of numbers) and query them by similarity. Central building block of a RAG.
Vector Stores
TL;DR
Specialized database to store embeddings (vectors of numbers) and query them by similarity. Central building block of a RAG.
The historical problem
Classic SQL databases cannot do cosine similarity search at scale. You had to either:
- Load all the vectors in memory and search in Python (not great at scale)
- Use specialized libs like FAISS (Facebook, 2017) or Annoy (Spotify)
But FAISS/Annoy are libs, not databases. They only store vectors, not documents, not metadata. For production RAG, you need external mapping and persistence.
How it works
A full vector store offers:
- Storage of embeddings + documents + metadata
- Indexing: ANN (Approximate Nearest Neighbors) structures like HNSW, IVF, PQ
- Query: similarity search (cosine, dot product, L2)
- Filtering: restrict search by metadata (type, date, user_id...)
- Scale: distribution, replication, backup
- Hybrid search: combine vector + BM25 / full-text
Typical RAG pipeline:
- Ingestion: doc -> chunk -> [[embeddings|embedding]] -> store
- Query: query -> embedding -> top-k search -> retrieve docs -> context for LLM
Relevance today (2026)
The vector store landscape is mature but fragmented:
The 2026 leaders:
- Qdrant: excellent perf/simplicity ratio, Rust, open-source
- Weaviate: native hybrid search, GraphQL, cloud-ready
- Milvus: massive scale, Zilliz ecosystem
- pgvector (Postgres extension): a must if you already have Postgres, performance boosted in 2024-2025
- Chroma: simple, local-first, good for prototyping
- Pinecone: managed SaaS, paid but zero ops
The losers / niches:
- FAISS raw: still useful as a low-level brick, but not production on its own
- Elasticsearch + vectors: ok but not as optimized as Qdrant/Milvus
Critical question today:
- You already have Postgres ? -> pgvector. Simpler ops. Performance has caught up with specialized stores for most cases.
- You want local-first / prototype ? -> Chroma or Qdrant.
- You scale to 100M+ vectors ? -> Milvus or Qdrant cluster.
- You want managed zero-ops ? -> Pinecone (expensive) or Qdrant Cloud.
Trends to watch:
- BM42 (Qdrant): new generation hybrid search
- Matryoshka embeddings: embeddings with variable resolutions, vector stores adapt
- Semantic cache: vector stores also become caches for LLM answers
- Graph RAG: vector stores integrate with graph DBs (Neo4j + embeddings)
Critical questions
- Why cosine similarity and not L2 distance ? Cases where one is better than the other ?
- HNSW vs IVF vs PQ: when to use each ?
- If I have 100M vectors, how do I partition ? Hash ? Semantic ?
- FAISS does not store docs: why is that a problem and how do you work around it ?
- How do I choose the number of embedding dimensions ? (384, 768, 1024, 3072)
- Hybrid search: how do you fuse BM25 and vector scores ? (RRF, alpha blend)
- Index rebuild when you change the embedding model: how do you handle it without downtime ?
- Metadata filtering + vector search: what is the cost ? Why is it hard ?
Production pitfalls
- Embedding model choice freezes the index: changing it = re-embed EVERYTHING
- Too high dimensionality: more expensive, not always better (curse of dimensionality)
- No metadata: impossible to trace, filter, debug
- No backup / restore: rebuilding a 100M vector index = hours / days
- p99 latency ignored: often 10x the p50, breaks UX
- In-process FAISS: not scalable, not multi-worker safe
- No recall monitoring: one day your index degrades and you find out too late
Alternatives / Comparisons
See /compare pgvector vs qdrant vs chroma vs milvus for the full matrix.
Summary:
| Store | Ease | Scale | Hybrid | Hosted | Best for |
|---|---|---|---|---|---|
| pgvector | +++ | ++ | yes (with ts_vector) | Supabase, Neon | If you already have Postgres |
| Qdrant | +++ | +++ | yes (BM42) | Qdrant Cloud | 2026 default |
| Weaviate | ++ | +++ | yes (native) | Weaviate Cloud | Hybrid first |
| Chroma | +++ | + | limited | Chroma Cloud | Prototype, local |
| Milvus | + | ++++ | yes | Zilliz | Massive scale |
| Pinecone | +++ | +++ | recent | SaaS only | Zero ops, expensive |
| FAISS | --- | + | no | N/A | Low-level lib |
Mini-lab
[[labs/03-vector-stores-benchmark/]] - index the same 10k docs in Chroma, Qdrant, pgvector. Measure: ingestion time, query latency, top-k precision.
To create: /lab vector-stores.
Further reading
- Qdrant benchmarks
- pgvector-scale (Timescale): boosted performance
- HNSW paper (Yu. A. Malkov)
- Matryoshka embeddings (2024)
- Graph RAG (Microsoft Research, 2024)