AI Engineering, one notion at a time.
13 domains, 12 notions and counting. Each page is a single concept with a TL;DR, the problem it solves, how it works, and a 2026 relevance check.
Foundations
The foundational building blocks: tokenization, embeddings, attention, transformers.
LLMs
Models themselves: loading, serving, quantization, inference-time optimization.
Prompt Engineering
Techniques to design effective prompts: structured output, chain of thought, XML tags.
RAG
From naive RAG to production: embeddings, chunking, vector stores, hybrid search, reranking.
Context Engineering
Managing the context window: compression, memory, prompt caching, budgeting.
AI Agents
Autonomous agents: ReAct, planning, tool use, function calling, multi-agent.
MCP
Model Context Protocol: standardized way to plug tools and resources into LLMs.
LLM Optimization
Inference servers, KV cache, batching, page attention. Serving LLMs fast and cheap.
Evaluations
Measuring LLM/agent/RAG quality: golden sets, LLM-as-judge, RAGAS, regression tests.
Observability
Tracing, logging, metrics for LLM apps: Langfuse, LangSmith, Arize, Helicone.
Fine-tuning
LoRA, QLoRA, RLHF, DPO, synthetic data. Specializing a model for a use case.
Infrastructure
Kubernetes for AI, GPU autoscaling, inference gateway, multi-cluster ops.
Safety & Guardrails
Red teaming, jailbreak defense, content filtering, PII redaction, alignment.