Block 1 - Chat with Gemini
Build a working Torah chatbot that talks to Gemini directly.
Step 1: First LLM call [DONE]
Python script that sends a Torah question to Gemini and displays the answer. Google GenAI SDK, system prompts, input validation. 6 tests green.
Step 2: Next.js frontend [DONE]
Chat interface with Next.js + shadcn/ui. Cal AI-inspired design (Bricolage Grotesque + Inter fonts, monochrome + orange accent). SSE streaming display word by word.
Step 3: Prompt engineering [DONE]
Chavruta system prompt with structured output format (TL;DR, Sources, Explanation). Guardrails: never invent sources, halakhic disclaimer rendered as amber warning box. SSE chunks encoded as JSON to preserve Markdown newlines. 5 prompt-specific tests green.
Block 2 - Make it a real app
Add a backend, auth, persistence, and ship it live.
Step 4: FastAPI backend [DONE]
REST API with POST /chat (sync) and POST /chat/stream (SSE). Health check endpoint. Error handling. CORS configured for Next.js. 4 API tests green.
Step 5: Auth + Save conversations [DONE]
JWT authentication (bcrypt + token). SQLite database with users, sessions, messages tables. CRUD endpoints for sessions. 7 auth tests green. MFA deferred to production deploy.
Step 6: History in frontend [DONE]
Login/register screen. Sidebar with past sessions, new chat button, sign out. Messages saved automatically. Click to resume a conversation. Token persisted in localStorage.
Step 7: Docker [DONE]
Dockerfiles for FastAPI (Python 3.14-slim) and Next.js (Node 22-alpine). docker-compose.yml with frontend + api. Weaviate already on Elestio, not in compose.
Step 8: Deploy to Elestio [SKIPPED]
Skipped for now. Will deploy after Block 4 is complete.
Block 3 - Add Sefaria knowledge
Build the RAG pipeline. Answers based on real texts.
Step 9: Load Sefaria datasets [DONE]
Downloaded 886K English texts + 3.55M Hebrew texts from HuggingFace. Pre-chunked by Sefaria (verse, mishna, sugya). 17 categories: Commentary (191K), Tanakh (153K), Talmud (117K), Liturgy (100K), Halakhah (74K), and more.
Step 10: NER enrichment [SKIPPED]
Skipped for MVP. Sefaria metadata already includes ref field for each text. NER would detect inline citations (when one text references another) - useful for V2 Graph RAG but not blocking.
Step 11: Embeddings + Weaviate [DONE - partial]
94,635 English texts indexed with Gemini Embedding 001 (3072d) in Weaviate. Hit spending cap at ~20 shekels. Plan to migrate to Qwen3-Embedding-0.6B local (free, 89.4% recall on Sefaria benchmark).
Read the full walkthrough (includes the cost disaster) ->
Step 12: RAG pipeline [DONE]
Refactored to LangChain LCEL: ChatPromptTemplate | ChatGoogleGenerativeAI | StrOutputParser. RAGPipeline class with .invoke() and .stream(). Hybrid search + Cohere rerank kept as raw SDK (LangChain limitation). Relevance threshold 0.3 with smart fallback.
Step 13: Hybrid search + Parashah [TODO]
Add BM25 keyword search alongside vector search (alpha=0.5 in Weaviate). Metadata filtering by book/category. Parashah of the week via Sefaria API /calendars.
Block 4 - Make it reliable
Track quality, measure performance, ship the final product.
Step 14: LangFuse observability [TODO]
Track every RAG request: search results, reranking scores, generation. See exactly why an answer is good or bad. LangFuse deployed on Elestio.
Step 15: Ragas evaluation [TODO]
Measure RAG quality on 50 test questions. Metrics: faithfulness, relevancy, context precision. LLM-as-Judge with Torah-specific rubric. On sacred texts, a hallucination is not just an error - it is an offense.
Step 16: Integration + deploy [TODO]
Deploy the full stack on Elestio (Frontend + Backend + Weaviate + LangFuse). Every answer has verifiable Sefaria sources with clickable links.