Foundations
The foundational building blocks: tokenization, embeddings, attention, transformers.
AI Engineering as a Discipline
AI engineering is the process of building applications on top of foundation models. It is distinct from ML engineering because you adapt existing models instead of training your own, you work with much larger and more expensive models, and you deal with open-ended outputs that are harder to evaluate.
The AI Engineering Stack (3 Layers)
Every AI application runs on a 3-layer stack: application development (top), model development (middle), infrastructure (bottom). You typically start at the top and move down only when you need more control or performance.
Foundation Models
A foundation model is a large, general-purpose model trained on huge data with self-supervision, that can be adapted to many tasks. The word "foundation" captures both their importance and the fact that you build applications on top of them. Covers LLMs (text) and LMMs (multimodal).
Language Models
A language model encodes statistical information about one or more languages. It predicts the next token given a context. Self-supervision let language models scale from toy experiments in the 1950s to the LLMs that power ChatGPT today.
Planning AI Applications
Before building an AI application, answer three questions: why should it exist, what role does AI play vs humans, and what milestone gets you from demo to production? It is easy to build a cool demo with foundation models. It is hard to create a profitable product.
Transformer Architecture
The transformer (Vaswani et al., 2017) is the dominant architecture for language foundation models. It replaced RNNs by using the attention mechanism to process all input tokens in parallel. Every LLM you use today (GPT, Claude, Gemini, Llama) is transformer-based.