AI Engineering as a Discipline
AI engineering is the process of building applications on top of foundation models. It is distinct from ML engineering because you adapt existing models instead of training your own, you work with much larger and more expensive models, and you deal with open-ended outputs that are harder to evaluate.
AI Engineering as a Discipline
TL;DR
AI engineering is the process of building applications on top of foundation models. It is distinct from ML engineering because you adapt existing models instead of training your own, you work with much larger and more expensive models, and you deal with open-ended outputs that are harder to evaluate.
The historical problem
From ~2010 to ~2022, building AI applications meant ML engineering:
- Collect labeled data
- Train your own model (or fine-tune a small pretrained one)
- Optimize hyperparameters
- Deploy to production
This required specialized ML expertise: loss functions, gradient descent, model architectures, evaluation metrics. It was an expensive skill set, so few teams could do it, and projects were long (3-12 months typical).
Then foundation models arrived. Model-as-a-service APIs (OpenAI, Anthropic, Google) made a powerful model available with one API call. Suddenly any software engineer could build an AI feature in a weekend.
But this created a gap: the old ML engineering vocabulary (train/eval/deploy) did not fully fit the new world (prompt/adapt/evaluate). A new discipline emerged.
How it works
Huyen's definition
AI engineering refers to the process of building applications on top of foundation models.
Three factors created the conditions for this discipline to explode:
-
General-purpose AI capabilities. Foundation models can do many tasks out of the box, including tasks not possible before (code generation, image synthesis, multimodal reasoning).
-
Increased AI investments. Post-ChatGPT, VC and enterprise money flooded in. Goldman Sachs estimated $200B global AI investment by 2025. One in three S&P 500 companies mentioned AI in earnings calls in Q2 2023, 3x more than the year before.
-
Low entrance barrier. Model-as-a-service APIs remove the infrastructure burden. You do not need GPUs, ML expertise, or training pipelines. Even non-programmers can build AI products with natural-language instructions.
AI engineering vs ML engineering
| Dimension | ML engineering | AI engineering |
|---|---|---|
| Model creation | Train your own | Use someone else's |
| Focus | Model development | Model adaptation |
| Scale | Small to medium models | Very large models |
| Compute | CPU to single GPU | Multi-GPU clusters, API calls |
| Outputs | Closed (labels, numbers) | Open-ended (text, images, code) |
| Evaluation | Standardized metrics (accuracy, F1) | Much harder, needs LLM-as-judge, golden sets |
| Iteration speed | Days to weeks | Minutes to hours |
Model adaptation is the heart of the job
Instead of training, you adapt. Two families of techniques:
- Prompt-based techniques (weights unchanged): prompt engineering, few-shot, RAG, context management. Easier, faster, less data.
- Weight-changing techniques: fine-tuning (LoRA, QLoRA, full fine-tune), RLHF, DPO. More complex, more data, better for strict performance.
Huyen's observation: prompt-based first, fine-tune only if needed. Many successful apps ship with prompt engineering alone.
Why "AI engineering" and not "MLOps" or "LLMOps"?
Huyen considered ML engineering, MLOps, AIOps, LLMOps, and others. She chose AI engineering because:
- ML engineering does not capture what's new
- "Ops" terms focus on operations, but the job is more about engineering (tweaking) models to do what you want
- She surveyed 20 practitioners, most preferred "AI engineering"
Relevance today (2026)
Huyen's framing is still the right one, but the details have shifted:
- The barrier is even lower. Tools like Cursor, Claude Code, Windsurf mean AI engineers now work WITH AI to build AI. The discipline is meta.
- Small models changed the economics. In 2024 Huyen assumed "foundation models are expensive". In 2026, Phi-4, Llama 3.2 1B, Gemma 3 4B run on laptops. Self-hosting for simple tasks became pragmatic.
- MCP (Model Context Protocol) is the new plumbing. Anthropic's MCP (2024-2025) standardized how tools and context plug into models. An AI engineer in 2026 needs to know MCP the way a web engineer knows HTTP. Huyen does not cover this, it post-dates the book.
- Reasoning models split the stack. OpenAI o3, Claude Opus 4.5 thinking mode, and DeepSeek R1 introduce a "thinking budget" concept. AI engineers now tune reasoning budget the way they used to tune temperature.
- Evaluation is bigger than Huyen suggests. She calls it the "hardest challenge" but the eval stack (Braintrust, Langfuse, LangSmith, Arize, RAGAS, OpenAI Evals, LLM-as-judge with bias correction) is the fastest-growing sub-discipline. See
08-evaluations/. - The job market caught up. In 2024, "AI engineer" was a new title. By 2026, it is well-established and often pays more than generic ML engineer, especially for frontier work. In Israel and the US, mid-senior AI engineers earn more than generic backend senior engineers.
Question: is AI engineering a permanent discipline or a temporary title that will merge back into software engineering? Huyen bets permanent. In 2026 the market says yes, but the skills set is still moving fast.
Critical questions
- If every software engineer can integrate a Claude API in an afternoon, what makes AI engineering actually specialized?
- What is the minimum ML background an AI engineer needs? (Huyen says you can do without, but recommends probability, ML basics, and neural net architectures.)
- Prompt engineering vs fine-tuning: when does one win? (Data availability, performance requirements, cost constraints all matter.)
- Is the hardest skill in AI engineering technical (eval, retrieval, infra) or product (knowing what NOT to build)?
- How do you demonstrate AI engineering competence in an interview? (Answer probably: a portfolio of working AI apps with thoughtful evals, not certifications.)
Production pitfalls
- Demo-to-prod gap. A foundation model + a cool prompt gives a 70% working demo in a weekend. The jump to 99% reliability for production takes months and is where most projects die.
- No eval, no prod. Teams that ship without systematic evals cannot tell if a prompt change helped or hurt. See
08-evaluations/. - Underestimating latency and cost. A chat app with 10 RAG calls per turn and a 50K context window can easily cost $1 per user turn. Model it early.
- Confusing prototypes for products. A streamlit demo with GPT-4 is not a product. Users expect reliability, privacy, observability, fallbacks.
- Skipping the human layer. The hardest part is often designing the feedback loop and UX around uncertainty (confidence indicators, refusal messages, escape hatches to humans).
Alternatives / Comparisons
| Role | What they build | Typical skills |
|---|---|---|
| ML engineer | Train and deploy ML models | Python, PyTorch, gradient descent, loss functions |
| AI engineer | Adapt foundation models in products | Prompt engineering, RAG, eval, LLM APIs, MCP |
| MLOps / LLMOps | Infrastructure for ML/LLM at scale | K8s, GPU management, observability, feature stores |
| Research engineer | Help researchers push model frontiers | Distributed training, ML fundamentals |
| Prompt engineer | Specialized prompting (rare standalone role) | Prompt patterns, eval, specific domains |
Overlap is high. Job titles vary wildly by company. What matters is what you ship, not what your business card says.
Mini-lab
Your own Torah Study AI project is itself the mini-lab for this notion. Map your current work to Huyen's framework:
- Which adaptation technique are you using (prompt, RAG, finetune)?
- Which stack layer are you in (application, model dev, infra)?
- What's your eval strategy?
Goal: use Huyen's vocabulary to describe your own project cleanly. It's a good interview rehearsal.
Further reading
- Huyen, Chapter 1 of AI Engineering (foundational text for this notion)
- Huyen, "Machine Learning Systems Design" (her prior book, ML engineering side)
- Swyx and Alessio Fanelli, Latent Space podcast - industry-level "what is AI engineering" discussions
- Andrej Karpathy, "Software 2.0" (2017) - early prediction of this shift
- AI Engineer Summit talks (2024-2026) - where the discipline defines itself publicly