Production RAG pipeline on 3.5M Jewish sacred texts from the Sefaria library. Every answer is grounded in verifiable sources. No hallucinations.
Key numbers
- 94,635 English texts indexed with Gemini Embedding 001
- 93.9% recall on Sefaria's own embedding benchmark (18 models tested)
- 0.3 relevance threshold with smart fallback when sources are weak
- 420 texts/sec local embedding throughput (Qwen3 on M1 Max)
- 16 documented build steps with BDD scenarios and TDD
What makes this different
Strict RAG on sensitive data. On sacred texts, a fabricated source is not an error - it is an offense. The pipeline never falls back to LLM general knowledge. If the retrieval score is below 0.3, it says "I didn't find this text" and suggests where to look.
Data-driven model selection. I benchmarked 18 embedding models using Sefaria's own Rabbinic Embedding Leaderboard before choosing. Gemini Embedding 001 wins at 93.9% recall. OpenAI scores 69.9%. The best open-source model (Qwen3-Embedding-8B) scores 89.4%.
Domain-aware chunking. The Sefaria dataset comes pre-chunked by verse, mishnah, and sugya - natural scholarly units. No generic RecursiveCharacterTextSplitter that would cut a Talmud passage in the middle of an argument.
Cost transparency. I estimated embedding costs at $2.60. The reality was 25x higher ($0.15/1M tokens, not $0.006). I documented the mistake, pivoted to English-only indexing, and planned migration to a free local model. The full cost analysis is in the build log.