RAG
03·RAG·updated 2026-04-13

Vector Stores

Specialized database to store embeddings (vectors of numbers) and query them by similarity. Central building block of a RAG.

Vector Stores

TL;DR

Specialized database to store embeddings (vectors of numbers) and query them by similarity. Central building block of a RAG.

The historical problem

Classic SQL databases cannot do cosine similarity search at scale. You had to either:

  • Load all the vectors in memory and search in Python (not great at scale)
  • Use specialized libs like FAISS (Facebook, 2017) or Annoy (Spotify)

But FAISS/Annoy are libs, not databases. They only store vectors, not documents, not metadata. For production RAG, you need external mapping and persistence.

How it works

A full vector store offers:

  1. Storage of embeddings + documents + metadata
  2. Indexing: ANN (Approximate Nearest Neighbors) structures like HNSW, IVF, PQ
  3. Query: similarity search (cosine, dot product, L2)
  4. Filtering: restrict search by metadata (type, date, user_id...)
  5. Scale: distribution, replication, backup
  6. Hybrid search: combine vector + BM25 / full-text

Typical RAG pipeline:

  • Ingestion: doc -> chunk -> [[embeddings|embedding]] -> store
  • Query: query -> embedding -> top-k search -> retrieve docs -> context for LLM

Relevance today (2026)

The vector store landscape is mature but fragmented:

The 2026 leaders:

  • Qdrant: excellent perf/simplicity ratio, Rust, open-source
  • Weaviate: native hybrid search, GraphQL, cloud-ready
  • Milvus: massive scale, Zilliz ecosystem
  • pgvector (Postgres extension): a must if you already have Postgres, performance boosted in 2024-2025
  • Chroma: simple, local-first, good for prototyping
  • Pinecone: managed SaaS, paid but zero ops

The losers / niches:

  • FAISS raw: still useful as a low-level brick, but not production on its own
  • Elasticsearch + vectors: ok but not as optimized as Qdrant/Milvus

Critical question today:

  • You already have Postgres ? -> pgvector. Simpler ops. Performance has caught up with specialized stores for most cases.
  • You want local-first / prototype ? -> Chroma or Qdrant.
  • You scale to 100M+ vectors ? -> Milvus or Qdrant cluster.
  • You want managed zero-ops ? -> Pinecone (expensive) or Qdrant Cloud.

Trends to watch:

  • BM42 (Qdrant): new generation hybrid search
  • Matryoshka embeddings: embeddings with variable resolutions, vector stores adapt
  • Semantic cache: vector stores also become caches for LLM answers
  • Graph RAG: vector stores integrate with graph DBs (Neo4j + embeddings)

Critical questions

  • Why cosine similarity and not L2 distance ? Cases where one is better than the other ?
  • HNSW vs IVF vs PQ: when to use each ?
  • If I have 100M vectors, how do I partition ? Hash ? Semantic ?
  • FAISS does not store docs: why is that a problem and how do you work around it ?
  • How do I choose the number of embedding dimensions ? (384, 768, 1024, 3072)
  • Hybrid search: how do you fuse BM25 and vector scores ? (RRF, alpha blend)
  • Index rebuild when you change the embedding model: how do you handle it without downtime ?
  • Metadata filtering + vector search: what is the cost ? Why is it hard ?

Production pitfalls

  • Embedding model choice freezes the index: changing it = re-embed EVERYTHING
  • Too high dimensionality: more expensive, not always better (curse of dimensionality)
  • No metadata: impossible to trace, filter, debug
  • No backup / restore: rebuilding a 100M vector index = hours / days
  • p99 latency ignored: often 10x the p50, breaks UX
  • In-process FAISS: not scalable, not multi-worker safe
  • No recall monitoring: one day your index degrades and you find out too late

Alternatives / Comparisons

See /compare pgvector vs qdrant vs chroma vs milvus for the full matrix.

Summary:

StoreEaseScaleHybridHostedBest for
pgvector+++++yes (with ts_vector)Supabase, NeonIf you already have Postgres
Qdrant++++++yes (BM42)Qdrant Cloud2026 default
Weaviate+++++yes (native)Weaviate CloudHybrid first
Chroma++++limitedChroma CloudPrototype, local
Milvus+++++yesZillizMassive scale
Pinecone++++++recentSaaS onlyZero ops, expensive
FAISS---+noN/ALow-level lib

Mini-lab

[[labs/03-vector-stores-benchmark/]] - index the same 10k docs in Chroma, Qdrant, pgvector. Measure: ingestion time, query latency, top-k precision.

To create: /lab vector-stores.

Further reading

ragvector-databaseinfrastructure