Writing about what I learn.

AI engineering in production - RAG, agents, evaluations. One post when I ship something or break it. No newsletter speak.

202615 posts

Apr 12

5 techniques to make your RAG system actually work

A vanilla RAG retrieves documents and hopes for the best. Here are 5 techniques that production RAG systems use to go from 'it works sometimes' to 'it works reliably'.

6 min · RAG · AI Engineering · Production · Retrieval

Apr 10

How a RAG Server Works, Step by Step

A RAG server has two phases: prepare the knowledge base once, then answer questions forever. Here is what happens at each step.

10 min · RAG · AI Engineering · Architecture · Fundamentals

Apr 09

Why strict RAG matters on sensitive data

When your LLM can fall back to general knowledge, it will. On religious texts, legal docs, or medical data, that is not acceptable. Here is why.

4 min · RAG · AI Engineering · Guardrails · Production

Apr 08

How to handle bad RAG results gracefully

Your RAG system found nothing relevant. Now what? The industry patterns for fallback strategies, relevance thresholds, and honest abstention.

6 min · RAG · AI Engineering · Retrieval · Production

Apr 06

Cohere Rerank: When to Use It (and When Not)

Reranking improved our search from 'sort of related' to 'exactly what you asked.' Here is how the scores work and when to add it to your pipeline.

4 min · RAG · Reranking · Cohere · AI Engineering

Apr 04

Choosing an Embedding Model: Benchmarks Over Brand

OpenAI is not always the best choice. How Sefaria's benchmark proved Gemini is 40% more accurate on Rabbinic texts, at 3x less cost.

5 min · RAG · Embeddings · AI Engineering · Benchmarks

Mar 21

RAG vs Long Context: do you still need a vector database?

Context windows now hold millions of tokens. So why not just dump everything in? Here's when RAG still wins, when long context is better, and how to choose.

6 min · RAG · Long Context · LLM · Architecture · AI Engineering

Mar 12

SHA-256: How It Works

What SHA-256 is, its key properties, and why we use it to track file changes

2 min · Security · hashing · SHA-256 · integrity

Mar 05

Why Reranking Matters

What reranking does, how cross-encoders work, and why it dramatically improves RAG quality

2 min · RAG · reranking · cross-encoder · Cohere · retrieval

Feb 24

Hybrid Search Explained

What hybrid search is, how the alpha parameter works, and when to adjust it

2 min · RAG · search · hybrid · BM25 · vectors · Weaviate

Feb 17

What Are Embedding Dimensions?

What dimensions mean in embedding vectors, whether more is better, and when it matters

2 min · RAG · embeddings · vectors · models

Feb 10

BM25 vs Vector Search

The difference between keyword search (BM25) and semantic search (vectors), and why you need both

2 min · RAG · search · BM25 · vectors · retrieval

Feb 02

What are tokens (and why they cost you money)

Tokens are the currency of LLMs. Understand how they work and you'll understand your bill, the limits, and the quirks of AI.

8 min · Tokens · LLM · Costs · Fundamentals

Jan 29

What is an LLM?

Large Language Models explained simply. What they are, how they work, and what they can do.

9 min · LLM · AI Engineering · Fundamentals

Jan 26

CLAUDE.md and AGENTS.md: giving your AI agent a memory

Your agent forgets everything between conversations. CLAUDE.md and AGENTS.md fix that. Here's what works everywhere, and what Claude Code adds on top.

7 min · Claude Code · CLAUDE.md · AGENTS.md · AI Engineering · Configuration