Production RAG pipeline on 3.5M sacred texts. Hybrid search, Cohere reranking, strict anti-hallucination guardrails. Built with FastAPI, Weaviate, and Gemini.

Production RAG pipeline on 3.5M Jewish sacred texts from the Sefaria library. Every answer is grounded in verifiable sources. No hallucinations.

Key numbers

94,635 English texts indexed with Gemini Embedding 001
93.9% recall on Sefaria's own embedding benchmark (18 models tested)
0.3 relevance threshold with smart fallback when sources are weak
420 texts/sec local embedding throughput (Qwen3 on M1 Max)
16 documented build steps with BDD scenarios and TDD

What makes this different

Strict RAG on sensitive data. On sacred texts, a fabricated source is not an error - it is an offense. The pipeline never falls back to LLM general knowledge. If the retrieval score is below 0.3, it says "I didn't find this text" and suggests where to look.

Data-driven model selection. I benchmarked 18 embedding models using Sefaria's own Rabbinic Embedding Leaderboard before choosing. Gemini Embedding 001 wins at 93.9% recall. OpenAI scores 69.9%. The best open-source model (Qwen3-Embedding-8B) scores 89.4%.

Domain-aware chunking. The Sefaria dataset comes pre-chunked by verse, mishnah, and sugya - natural scholarly units. No generic RecursiveCharacterTextSplitter that would cut a Talmud passage in the middle of an argument.

Cost transparency. I estimated embedding costs at $2.60. The reality was 25x higher ($0.15/1M tokens, not $0.006). I documented the mistake, pivoted to English-only indexing, and planned migration to a free local model. The full cost analysis is in the build log.

Torah Study AI

Key numbers

What makes this different

Stack

Read more

Why

Tech Choices

Instructions

Roadmap