Roadmap - Torah Study AI

Timeline, progress tracking, and V2 planned features including CRAG, HyDE, and Graph RAG.

Timeline

Result: A chat that answers Torah questions using Gemini's general knowledge.

Result: Working app with auth, persistence, Docker-ready. Deploy after RAG is added.

Step 9: Load Sefaria HuggingFace datasets (already chunked with metadata)
Step 10: NER enrichment (skipped for MVP)
Step 11: Embed with Gemini Embedding 001 + index in Weaviate (94K texts, English only)
Step 12: RAG pipeline with LangChain LCEL (retrieval + rerank + generation)
Step 13: Hybrid search (vector + BM25, alpha=0.5)

Result: Answers based on real 3.5M texts with verifiable Sefaria sources. Weekly Parashah feature.

Result: Production-ready with quality tracking. No hallucinations.

CRAG (Corrective RAG) - reformulate query and retry when retrieval fails. Handles transliteration (Bereshit/Bereishit/Genesis)
HyDE (Hypothetical Document Embeddings) - generate a fake answer first, embed it, then search. Improves cross-language retrieval (Hebrew question -> English sources)
Hierarchical chunking - add book/chapter level vectors alongside verse-level chunks for better retrieval on broad questions
Few-shot examples in system prompt - add 2-3 ideal response examples to align LLM output format
LLM-as-Judge evaluation - automated rubric-based scoring instead of manual testing only