David Alimi
Building in public. Documenting what I learn as an AI Engineer.
Current projects
View allRecent articles
April 12, 2026 · 6 min read5 techniques to make your RAG system actually workA vanilla RAG retrieves documents and hopes for the best. Here are 5 techniques that production RAG systems use to go from 'it works sometimes' to 'it works reliably'.
RAGAI EngineeringProductionRetrieval
April 10, 2026 · 10 min readHow a RAG Server Works, Step by StepA RAG server has two phases: prepare the knowledge base once, then answer questions forever. Here is what happens at each step.RAGAI EngineeringArchitectureFundamentals
April 9, 2026 · 4 min readWhy strict RAG matters on sensitive dataWhen your LLM can fall back to general knowledge, it will. On religious texts, legal docs, or medical data, that is not acceptable. Here is why.RAGAI EngineeringGuardrailsProduction
April 8, 2026 · 6 min readHow to handle bad RAG results gracefullyYour RAG system found nothing relevant. Now what? The industry patterns for fallback strategies, relevance thresholds, and honest abstention.RAGAI EngineeringRetrievalProduction
April 6, 2026 · 4 min readCohere Rerank: When to Use It (and When Not)Reranking improved our search from 'sort of related' to 'exactly what you asked.' Here is how the scores work and when to add it to your pipeline.RAGRerankingCohereAI Engineering
April 4, 2026 · 5 min readChoosing an Embedding Model: Benchmarks Over BrandOpenAI is not always the best choice. How Sefaria's benchmark proved Gemini is 40% more accurate on Rabbinic texts, at 3x less cost.RAGEmbeddingsAI EngineeringBenchmarks
March 21, 2026 · 6 min readRAG vs Long Context: do you still need a vector database?Context windows now hold millions of tokens. So why not just dump everything in? Here's when RAG still wins, when long context is better, and how to choose.RAGLong ContextLLMArchitectureAI Engineering
March 12, 2026 · 2 min readSHA-256: How It WorksWhat SHA-256 is, its key properties, and why we use it to track file changesSecurityhashingSHA-256integrity
March 5, 2026 · 2 min readWhy Reranking MattersWhat reranking does, how cross-encoders work, and why it dramatically improves RAG qualityRAGrerankingcross-encoderCohereretrieval
February 24, 2026 · 2 min readHybrid Search ExplainedWhat hybrid search is, how the alpha parameter works, and when to adjust itRAGsearchhybridBM25vectorsWeaviate
February 17, 2026 · 2 min readWhat Are Embedding Dimensions?What dimensions mean in embedding vectors, whether more is better, and when it mattersRAGembeddingsvectorsmodels
February 10, 2026 · 2 min readBM25 vs Vector SearchThe difference between keyword search (BM25) and semantic search (vectors), and why you need bothRAGsearchBM25vectorsretrieval
February 2, 2026 · 8 min readWhat are tokens (and why they cost you money)Tokens are the currency of LLMs. Understand how they work and you'll understand your bill, the limits, and the quirks of AI.TokensLLMCostsFundamentals
January 29, 2026 · 9 min readWhat is an LLM?Large Language Models explained simply. What they are, how they work, and what they can do.LLMAI EngineeringFundamentals
January 26, 2026 · 7 min readCLAUDE.md and AGENTS.md: giving your AI agent a memoryYour agent forgets everything between conversations. CLAUDE.md and AGENTS.md fix that. Here's what works everywhere, and what Claude Code adds on top.Claude CodeCLAUDE.mdAGENTS.mdAI EngineeringConfiguration
Join the newsletter
Get new articles directly in your inbox.