Writing about what I learn.

AI engineering in production - RAG, agents, evaluations. One post when I ship something or break it. No newsletter speak.

202615 posts
Apr 12

5 techniques to make your RAG system actually work

A vanilla RAG retrieves documents and hopes for the best. Here are 5 techniques that production RAG systems use to go from 'it works sometimes' to 'it works reliably'.

6 min · RAG · AI Engineering · Production · Retrieval
Apr 10

How a RAG Server Works, Step by Step

A RAG server has two phases: prepare the knowledge base once, then answer questions forever. Here is what happens at each step.

10 min · RAG · AI Engineering · Architecture · Fundamentals
Apr 09

Why strict RAG matters on sensitive data

When your LLM can fall back to general knowledge, it will. On religious texts, legal docs, or medical data, that is not acceptable. Here is why.

4 min · RAG · AI Engineering · Guardrails · Production
Apr 08

How to handle bad RAG results gracefully

Your RAG system found nothing relevant. Now what? The industry patterns for fallback strategies, relevance thresholds, and honest abstention.

6 min · RAG · AI Engineering · Retrieval · Production
Apr 06

Cohere Rerank: When to Use It (and When Not)

Reranking improved our search from 'sort of related' to 'exactly what you asked.' Here is how the scores work and when to add it to your pipeline.

4 min · RAG · Reranking · Cohere · AI Engineering
Apr 04

Choosing an Embedding Model: Benchmarks Over Brand

OpenAI is not always the best choice. How Sefaria's benchmark proved Gemini is 40% more accurate on Rabbinic texts, at 3x less cost.

5 min · RAG · Embeddings · AI Engineering · Benchmarks
Mar 21

RAG vs Long Context: do you still need a vector database?

Context windows now hold millions of tokens. So why not just dump everything in? Here's when RAG still wins, when long context is better, and how to choose.

6 min · RAG · Long Context · LLM · Architecture · AI Engineering
Mar 12

SHA-256: How It Works

What SHA-256 is, its key properties, and why we use it to track file changes

2 min · Security · hashing · SHA-256 · integrity
Mar 05

Why Reranking Matters

What reranking does, how cross-encoders work, and why it dramatically improves RAG quality

2 min · RAG · reranking · cross-encoder · Cohere · retrieval
Feb 24

Hybrid Search Explained

What hybrid search is, how the alpha parameter works, and when to adjust it

2 min · RAG · search · hybrid · BM25 · vectors · Weaviate
Feb 17

What Are Embedding Dimensions?

What dimensions mean in embedding vectors, whether more is better, and when it matters

2 min · RAG · embeddings · vectors · models
Feb 10

BM25 vs Vector Search

The difference between keyword search (BM25) and semantic search (vectors), and why you need both

2 min · RAG · search · BM25 · vectors · retrieval
Feb 02

What are tokens (and why they cost you money)

Tokens are the currency of LLMs. Understand how they work and you'll understand your bill, the limits, and the quirks of AI.

8 min · Tokens · LLM · Costs · Fundamentals
Jan 29

What is an LLM?

Large Language Models explained simply. What they are, how they work, and what they can do.

9 min · LLM · AI Engineering · Fundamentals
Jan 26

CLAUDE.md and AGENTS.md: giving your AI agent a memory

Your agent forgets everything between conversations. CLAUDE.md and AGENTS.md fix that. Here's what works everywhere, and what Claude Code adds on top.

7 min · Claude Code · CLAUDE.md · AGENTS.md · AI Engineering · Configuration