AI Engineering as a Discipline

TL;DR

AI engineering is the process of building applications on top of foundation models. It is distinct from ML engineering because you adapt existing models instead of training your own, you work with much larger and more expensive models, and you deal with open-ended outputs that are harder to evaluate.

The historical problem

From ~2010 to ~2022, building AI applications meant ML engineering:

Collect labeled data
Train your own model (or fine-tune a small pretrained one)
Optimize hyperparameters
Deploy to production

This required specialized ML expertise: loss functions, gradient descent, model architectures, evaluation metrics. It was an expensive skill set, so few teams could do it, and projects were long (3-12 months typical).

Then foundation models arrived. Model-as-a-service APIs (OpenAI, Anthropic, Google) made a powerful model available with one API call. Suddenly any software engineer could build an AI feature in a weekend.

But this created a gap: the old ML engineering vocabulary (train/eval/deploy) did not fully fit the new world (prompt/adapt/evaluate). A new discipline emerged.

How it works

Huyen's definition

AI engineering refers to the process of building applications on top of foundation models.

Three factors created the conditions for this discipline to explode:

General-purpose AI capabilities. Foundation models can do many tasks out of the box, including tasks not possible before (code generation, image synthesis, multimodal reasoning).
Increased AI investments. Post-ChatGPT, VC and enterprise money flooded in. Goldman Sachs estimated $200B global AI investment by 2025. One in three S&P 500 companies mentioned AI in earnings calls in Q2 2023, 3x more than the year before.
Low entrance barrier. Model-as-a-service APIs remove the infrastructure burden. You do not need GPUs, ML expertise, or training pipelines. Even non-programmers can build AI products with natural-language instructions.

AI engineering vs ML engineering

Dimension	ML engineering	AI engineering
Model creation	Train your own	Use someone else's
Focus	Model development	Model adaptation
Scale	Small to medium models	Very large models
Compute	CPU to single GPU	Multi-GPU clusters, API calls
Outputs	Closed (labels, numbers)	Open-ended (text, images, code)
Evaluation	Standardized metrics (accuracy, F1)	Much harder, needs LLM-as-judge, golden sets
Iteration speed	Days to weeks	Minutes to hours

Model adaptation is the heart of the job

Instead of training, you adapt. Two families of techniques:

Prompt-based techniques (weights unchanged): prompt engineering, few-shot, RAG, context management. Easier, faster, less data.
Weight-changing techniques: fine-tuning (LoRA, QLoRA, full fine-tune), RLHF, DPO. More complex, more data, better for strict performance.

Huyen's observation: prompt-based first, fine-tune only if needed. Many successful apps ship with prompt engineering alone.

Why "AI engineering" and not "MLOps" or "LLMOps"?

Huyen considered ML engineering, MLOps, AIOps, LLMOps, and others. She chose AI engineering because:

ML engineering does not capture what's new
"Ops" terms focus on operations, but the job is more about engineering (tweaking) models to do what you want
She surveyed 20 practitioners, most preferred "AI engineering"

Relevance today (2026)

Huyen's framing is still the right one, but the details have shifted:

The barrier is even lower. Tools like Cursor, Claude Code, Windsurf mean AI engineers now work WITH AI to build AI. The discipline is meta.
Small models changed the economics. In 2024 Huyen assumed "foundation models are expensive". In 2026, Phi-4, Llama 3.2 1B, Gemma 3 4B run on laptops. Self-hosting for simple tasks became pragmatic.
MCP (Model Context Protocol) is the new plumbing. Anthropic's MCP (2024-2025) standardized how tools and context plug into models. An AI engineer in 2026 needs to know MCP the way a web engineer knows HTTP. Huyen does not cover this, it post-dates the book.
Reasoning models split the stack. OpenAI o3, Claude Opus 4.5 thinking mode, and DeepSeek R1 introduce a "thinking budget" concept. AI engineers now tune reasoning budget the way they used to tune temperature.
Evaluation is bigger than Huyen suggests. She calls it the "hardest challenge" but the eval stack (Braintrust, Langfuse, LangSmith, Arize, RAGAS, OpenAI Evals, LLM-as-judge with bias correction) is the fastest-growing sub-discipline. See 08-evaluations/.
The job market caught up. In 2024, "AI engineer" was a new title. By 2026, it is well-established and often pays more than generic ML engineer, especially for frontier work. In Israel and the US, mid-senior AI engineers earn more than generic backend senior engineers.

Question: is AI engineering a permanent discipline or a temporary title that will merge back into software engineering? Huyen bets permanent. In 2026 the market says yes, but the skills set is still moving fast.

Critical questions

If every software engineer can integrate a Claude API in an afternoon, what makes AI engineering actually specialized?
What is the minimum ML background an AI engineer needs? (Huyen says you can do without, but recommends probability, ML basics, and neural net architectures.)
Prompt engineering vs fine-tuning: when does one win? (Data availability, performance requirements, cost constraints all matter.)
Is the hardest skill in AI engineering technical (eval, retrieval, infra) or product (knowing what NOT to build)?
How do you demonstrate AI engineering competence in an interview? (Answer probably: a portfolio of working AI apps with thoughtful evals, not certifications.)

Production pitfalls

Demo-to-prod gap. A foundation model + a cool prompt gives a 70% working demo in a weekend. The jump to 99% reliability for production takes months and is where most projects die.
No eval, no prod. Teams that ship without systematic evals cannot tell if a prompt change helped or hurt. See 08-evaluations/.
Underestimating latency and cost. A chat app with 10 RAG calls per turn and a 50K context window can easily cost $1 per user turn. Model it early.
Confusing prototypes for products. A streamlit demo with GPT-4 is not a product. Users expect reliability, privacy, observability, fallbacks.
Skipping the human layer. The hardest part is often designing the feedback loop and UX around uncertainty (confidence indicators, refusal messages, escape hatches to humans).

Alternatives / Comparisons

Role	What they build	Typical skills
ML engineer	Train and deploy ML models	Python, PyTorch, gradient descent, loss functions
AI engineer	Adapt foundation models in products	Prompt engineering, RAG, eval, LLM APIs, MCP
MLOps / LLMOps	Infrastructure for ML/LLM at scale	K8s, GPU management, observability, feature stores
Research engineer	Help researchers push model frontiers	Distributed training, ML fundamentals
Prompt engineer	Specialized prompting (rare standalone role)	Prompt patterns, eval, specific domains

Overlap is high. Job titles vary wildly by company. What matters is what you ship, not what your business card says.

Mini-lab

Pick any AI-powered app you use daily (ChatGPT, Cursor, Perplexity, GitHub Copilot, Claude Desktop, Notion AI) and map it to Huyen's framework:

Which adaptation technique powers it - prompt engineering, RAG, fine-tuning, or some mix?
Which stack layer does the company you want to work at operate in - application, model development, or infrastructure?
How would YOU evaluate this app's quality as a user? What metrics matter?
What is the moat? (Proprietary data, distribution, workflow integration, or none?)

Write 3-4 sentences per question. Goal: speak Huyen's vocabulary fluently. This maps directly to the first 5 minutes of most AI engineer interviews.

Suggested starting candidates by difficulty:

Easy: ChatGPT (clear: application layer, prompt + RAG, OpenAI model)
Medium: Cursor or GitHub Copilot (clear adaptation, trickier moat)
Hard: Perplexity (layers: their RAG vs underlying LLM providers), Notion AI (integration depth vs standalone wrapper?)

AI Engineering as a Discipline

AI Engineering as a Discipline

TL;DR

The historical problem

How it works

Huyen's definition

AI engineering vs ML engineering

Model adaptation is the heart of the job

Why "AI engineering" and not "MLOps" or "LLMOps"?

Relevance today (2026)

Critical questions

Production pitfalls

Alternatives / Comparisons

Mini-lab

Further reading