00·8 notions

Foundations

The foundational building blocks: tokenization, embeddings, attention, transformers.

AI Engineering as a Discipline

AI engineering is the process of building applications on top of foundation models. It is distinct from ML engineering because you adapt existing models instead of training your own, you work with much larger and more expensive models, and you deal with open-ended outputs that are harder to evaluate.

Foundation Models

A foundation model is a large, general-purpose model trained on huge data with self-supervision, that can be adapted to many tasks. The word "foundation" captures both their importance and the fact that you build applications on top of them. Covers LLMs (text) and LMMs (multimodal).

Language Models

A language model encodes statistical information about one or more languages. It predicts the next token given a context. Self-supervision let language models scale from toy experiments in the 1950s to the LLMs that power ChatGPT today.

Attention Mechanism (Q, K, V deep dive)

Attention is a weighted lookup. Given a **Query** vector (what you are looking for), you compare it against **Key** vectors (what each item advertises), and retrieve a weighted combination of **Value** vectors (what each item actually contains). In transformers, Q, K, V are learned linear projections of the same input.

Transformer Architecture

The transformer (Vaswani et al., 2017) is the dominant architecture for language foundation models. It replaced RNNs by using the attention mechanism to process all input tokens in parallel. Every LLM you use today (GPT, Claude, Gemini, Llama) is transformer-based.

Scaling Laws

Scaling laws describe how model quality improves with three inputs: model size (parameters), dataset size (tokens), and compute (FLOPs). The Chinchilla law (DeepMind 2022) showed you need roughly **20 tokens per parameter** for compute-optimal training. In 2026, over-training is the norm and test-time compute is a new scaling dimension Huyen did not cover.

The AI Engineering Stack (3 Layers)

Every AI application runs on a 3-layer stack: application development (top), model development (middle), infrastructure (bottom). You typically start at the top and move down only when you need more control or performance.

Planning AI Applications

Before building an AI application, answer three questions: why should it exist, what role does AI play vs humans, and what milestone gets you from demo to production? It is easy to build a cool demo with foundation models. It is hard to create a profitable product.