David Alimi — AI engineer writing about what I actually build.

Building in public. Documenting what I learn as an AI Engineer.

Currently building

Torah Study AI

Production RAG pipeline on 3.5M sacred texts. Hybrid search, Cohere reranking, strict anti-hallucination guardrails. Built with FastAPI, Weaviate, and Gemini.

PythonFastAPIWeaviateGemini 2.5 FlashCohere Rerank

Knowledge base

Browse all →

AI Engineering, one notion at a time

38 notions across 13 domains - LLMs, RAG, agents, MCP, prompt caching, evaluations, and more. Each page: TL;DR, problem, how it works, 2026 relevance.

FoundationsLLMsPrompt EngineeringRAGContext EngineeringAI Agents+7 more

Recent writing

All 15 →

Apr 12
2026

5 techniques to make your RAG system actually work

A vanilla RAG retrieves documents and hopes for the best. Here are 5 techniques that production RAG systems use to go from 'it works sometimes' to 'it works reliably'.

6 min · RAG · AI Engineering · Production · Retrieval

Apr 10
2026

How a RAG Server Works, Step by Step

A RAG server has two phases: prepare the knowledge base once, then answer questions forever. Here is what happens at each step.

10 min · RAG · AI Engineering · Architecture · Fundamentals

Apr 09
2026

Why strict RAG matters on sensitive data

When your LLM can fall back to general knowledge, it will. On religious texts, legal docs, or medical data, that is not acceptable. Here is why.

4 min · RAG · AI Engineering · Guardrails · Production

Apr 08
2026

How to handle bad RAG results gracefully

Your RAG system found nothing relevant. Now what? The industry patterns for fallback strategies, relevance thresholds, and honest abstention.

6 min · RAG · AI Engineering · Retrieval · Production

A short dispatch when I ship something or break it. No hype. Unsubscribe anytime.