Vector Stores

TL;DR

Specialized database to store embeddings (vectors of numbers) and query them by similarity. Central building block of a RAG.

The historical problem

Classic SQL databases cannot do cosine similarity search at scale. You had to either:

Load all the vectors in memory and search in Python (not great at scale)
Use specialized libs like FAISS (Facebook, 2017) or Annoy (Spotify)

But FAISS/Annoy are libs, not databases. They only store vectors, not documents, not metadata. For production RAG, you need external mapping and persistence.

How it works

A full vector store offers:

Storage of embeddings + documents + metadata
Indexing: ANN (Approximate Nearest Neighbors) structures like HNSW, IVF, PQ
Query: similarity search (cosine, dot product, L2)
Filtering: restrict search by metadata (type, date, user_id...)
Scale: distribution, replication, backup
Hybrid search: combine vector + BM25 / full-text

Typical RAG pipeline:

Ingestion: doc -> chunk -> [[embeddings|embedding]] -> store
Query: query -> embedding -> top-k search -> retrieve docs -> context for LLM

Relevance today (2026)

The vector store landscape is mature but fragmented:

The 2026 leaders:

Qdrant: excellent perf/simplicity ratio, Rust, open-source
Weaviate: native hybrid search, GraphQL, cloud-ready
Milvus: massive scale, Zilliz ecosystem
pgvector (Postgres extension): a must if you already have Postgres, performance boosted in 2024-2025
Chroma: simple, local-first, good for prototyping
Pinecone: managed SaaS, paid but zero ops

The losers / niches:

FAISS raw: still useful as a low-level brick, but not production on its own
Elasticsearch + vectors: ok but not as optimized as Qdrant/Milvus

Critical question today:

You already have Postgres ? -> pgvector. Simpler ops. Performance has caught up with specialized stores for most cases.
You want local-first / prototype ? -> Chroma or Qdrant.
You scale to 100M+ vectors ? -> Milvus or Qdrant cluster.
You want managed zero-ops ? -> Pinecone (expensive) or Qdrant Cloud.

Trends to watch:

BM42 (Qdrant): new generation hybrid search
Matryoshka embeddings: embeddings with variable resolutions, vector stores adapt
Semantic cache: vector stores also become caches for LLM answers
Graph RAG: vector stores integrate with graph DBs (Neo4j + embeddings)

Critical questions

Why cosine similarity and not L2 distance ? Cases where one is better than the other ?
HNSW vs IVF vs PQ: when to use each ?
If I have 100M vectors, how do I partition ? Hash ? Semantic ?
FAISS does not store docs: why is that a problem and how do you work around it ?
How do I choose the number of embedding dimensions ? (384, 768, 1024, 3072)
Hybrid search: how do you fuse BM25 and vector scores ? (RRF, alpha blend)
Index rebuild when you change the embedding model: how do you handle it without downtime ?
Metadata filtering + vector search: what is the cost ? Why is it hard ?

Production pitfalls

Embedding model choice freezes the index: changing it = re-embed EVERYTHING
Too high dimensionality: more expensive, not always better (curse of dimensionality)
No metadata: impossible to trace, filter, debug
No backup / restore: rebuilding a 100M vector index = hours / days
p99 latency ignored: often 10x the p50, breaks UX
In-process FAISS: not scalable, not multi-worker safe
No recall monitoring: one day your index degrades and you find out too late

Alternatives / Comparisons

See /compare pgvector vs qdrant vs chroma vs milvus for the full matrix.

Summary:

Store	Ease	Scale	Hybrid	Hosted	Best for
pgvector	+++	++	yes (with ts_vector)	Supabase, Neon	If you already have Postgres
Qdrant	+++	+++	yes (BM42)	Qdrant Cloud	2026 default
Weaviate	++	+++	yes (native)	Weaviate Cloud	Hybrid first
Chroma	+++	+	limited	Chroma Cloud	Prototype, local
Milvus	+	++++	yes	Zilliz	Massive scale
Pinecone	+++	+++	recent	SaaS only	Zero ops, expensive
FAISS	---	+	no	N/A	Low-level lib

Mini-lab

[[labs/03-vector-stores-benchmark/]] - index the same 10k docs in Chroma, Qdrant, pgvector. Measure: ingestion time, query latency, top-k precision.

To create: /lab vector-stores.

Vector Stores

Vector Stores

TL;DR

The historical problem

How it works

Relevance today (2026)

Critical questions

Production pitfalls

Alternatives / Comparisons

Mini-lab

Further reading