AI Engineering, one notion at a time.

13 domains, 38 notions and counting. Each page is a single concept with a TL;DR, the problem it solves, how it works, and a 2026 relevance check.

The foundational building blocks: tokenization, embeddings, attention, transformers.

Models themselves: loading, serving, quantization, inference-time optimization.

Techniques to design effective prompts: structured output, chain of thought, XML tags.

From naive RAG to production: embeddings, chunking, vector stores, hybrid search, reranking.

Managing the context window: compression, memory, prompt caching, budgeting.

Autonomous agents: ReAct, planning, tool use, function calling, multi-agent.

Model Context Protocol: standardized way to plug tools and resources into LLMs.

Inference servers, KV cache, batching, page attention. Serving LLMs fast and cheap.

Measuring LLM/agent/RAG quality: golden sets, LLM-as-judge, RAGAS, regression tests.

Tracing, logging, metrics for LLM apps: Langfuse, LangSmith, Arize, Helicone.

LoRA, QLoRA, RLHF, DPO, synthetic data. Specializing a model for a use case.

Kubernetes for AI, GPU autoscaling, inference gateway, multi-cluster ops.

Red teaming, jailbreak defense, content filtering, PII redaction, alignment.