Planning AI Applications
Before building an AI application, answer three questions: why should it exist, what role does AI play vs humans, and what milestone gets you from demo to production? It is easy to build a cool demo with foundation models. It is hard to create a profitable product.
Planning AI Applications
TL;DR
Before building an AI application, answer three questions: why should it exist, what role does AI play vs humans, and what milestone gets you from demo to production? It is easy to build a cool demo with foundation models. It is hard to create a profitable product.
The historical problem
In 2023-2024, most companies felt FOMO. "We need AI" became a directive. Teams were told to "integrate AI" without a clear use case. The result was thousands of demos, most of which never reached production.
Huyen saw this across the industry and wrote Chapter 1 to give AI engineers a framework to answer the uncomfortable question: should this app even exist?
How it works
Step 1: Use case evaluation (why build it)
Three levels of motivation, from urgent to speculative:
- Existential risk: if we do not build AI, competitors will kill us. Common in document processing (insurance, finance), creative work (advertising, design), and information-heavy industries. 2023 Gartner: 7% of AI adopters cited business continuity. Reference: OpenAI's "GPTs are GPTs" (Eloundou et al., 2023) ranks industry exposure.
- Opportunity: AI boosts profit or productivity. Customer support, sales lead generation, content creation, internal knowledge search.
- Strategic learning: not sure where AI fits yet, but do not want to be left behind. R&D budget, optional.
If motivation is 1, build in-house. If motivation is 2 or 3, buy first, build only if buy fails.
Step 2: The role of AI and humans
Three axes to classify the AI feature:
Critical vs complementary
- Critical: app cannot work without AI (Face ID, DALL-E, Cursor autocomplete).
- Complementary: app works without AI, AI enhances it (Gmail Smart Compose, Google Maps traffic prediction).
Rule: the more critical AI is, the higher the reliability bar. Users forgive a wrong suggestion in Smart Compose. They do not forgive a failed Face ID unlock.
Reactive vs proactive
- Reactive: AI responds to user action (chatbot, search, completion).
- Proactive: AI acts without being asked (traffic alerts, recommendations, scheduled summaries).
Reactive needs low latency (users wait). Proactive needs high quality (users did not ask, so mistakes feel intrusive).
Dynamic vs static
- Static: model updates rarely, one model per user segment (default ChatGPT for everyone).
- Dynamic: model adapts continuously per user (Face ID updates as your face ages, ChatGPT's memory feature, personalized fine-tunes).
Dynamic is harder. You need per-user state, drift detection, and privacy guarantees.
Step 3: AI product defensibility
Huyen asks: if your app is just a GPT wrapper, why should you exist?
Three common moats:
- Proprietary data - unique training data or retrieval corpus (medical records, legal history)
- Distribution - owning the user (Microsoft integrates Copilot into Office, Salesforce into CRM)
- Workflow and UX - deep integration into a specific job (Cursor for devs, Harvey for lawyers)
"Wrapper" apps without moat got replicated by Microsoft in two weeks in 2023. Defensibility is a must.
Step 4: Setting expectations and milestones
Demo-to-production is the AI engineering graveyard. Huyen's advice:
- Quality bar depends on criticality: medical diagnosis needs 99.99%, content generation can ship at 80%.
- Set user-facing expectations: tell users the system is AI-powered and can be wrong. Confidence indicators, escape hatches to humans.
- Iterative milestones: plan an alpha (5% works), beta (70% works), GA (95% works) path. Plan for the evaluation framework BEFORE shipping alpha, not after.
Relevance today (2026)
Huyen's framework from 2024 holds up well. Adjustments for 2026:
- Defensibility pressure is higher. In 2024, a GPT-4 wrapper could get VC funding. In 2026, wrappers get cloned in a weekend by someone with Cursor. Defensibility is not optional.
- Dynamic features easier to build. Vector DBs + per-user memory (ChatGPT Memory, Claude Projects) are now cheap. Huyen's "dynamic is hard" is less true in 2026.
- "Proactive AI" is hot. Agent-based apps (scheduled summaries, inbox triage, workflow automation) exploded in 2025-2026 with MCP. Huyen's proactive category deserves more weight today.
- Quality bar shifted up. Users experienced Claude Opus 4.x and GPT-5 in ChatGPT. They will not tolerate a 70%-works product in 2026 unless the value is extraordinary.
- Regulatory pressure. EU AI Act (2024-2025 enforcement), US state-level AI laws. Planning must now include: is this a high-risk AI system under the regulation? What compliance does it trigger?
Question: in 2026, is "should we build this?" the wrong question? Maybe it is "should we buy, integrate, or build?". Most answers are "integrate existing SaaS, do not build from scratch".
Critical questions
- If your use case is "opportunity" (not existential), and a SaaS already exists (e.g., Zendesk AI for support), what justifies building your own?
- What is your quality bar? What would cause a user to churn after a bad AI response?
- Who is responsible when the AI fails? Do you have a human-in-the-loop fallback?
- How will you measure success? Business metric (conversion, NPS, retention) or AI-specific metric (accuracy, helpfulness)?
- What is your cost per successful interaction? Can you sustain it at 10x your current scale?
Production pitfalls
- Skipping the "why" question. Teams get a leadership mandate and skip use-case evaluation. Result: a showpiece demo with no business value.
- Mistaking novelty for moat. First-mover advantage in AI is 2-6 weeks before a competitor clones you. Only durable moats (data, distribution, workflow) survive.
- Over-promising accuracy. Telling users "AI can do X" when it does X 70% of the time. Better: "AI suggests X, you confirm" (human-in-the-loop framing).
- Missing the eval. Without an eval set, you cannot tell if your prompt change helped or hurt. See
08-evaluations/. - Static when you needed dynamic. One model for all users, serving wildly different use cases, gets mediocre across the board. Consider personalization early.
Alternatives / Comparisons
Alternatives to building a custom AI app:
| Option | When to prefer | Downside |
|---|---|---|
| Use an existing SaaS with AI (Notion AI, Intercom Fin) | Off-the-shelf fits your workflow | Limited customization, shared moat |
| Embed Claude/GPT API into existing product | Your product is the moat, AI is a feature | Vendor dependency |
| Fine-tune a smaller model for your task | Privacy, cost, scale requires it | Ops burden, slower iteration |
| Build a full vertical AI product | You have proprietary data or workflow insight | 6-12 month build, risky |
Mini-lab
Pick a real AI product (existing or an idea you would like to build) and answer Huyen's framework in writing. For example: a customer support chatbot, a legal document summarizer, a language tutor, a code reviewer.
For the product you picked, answer each in 1-2 sentences:
- Use case motivation: existential threat, profit/productivity opportunity, or strategic learning?
- Critical or complementary: does the app still work without the AI piece?
- Reactive or proactive: does the AI respond to a user action, or push suggestions unprompted?
- Static or dynamic: same model for everyone, or personalized per user over time?
- Defensibility: what is the moat - proprietary data, distribution, workflow integration, or none?
- Milestones: what quality bar for alpha (5% useful), beta (70%), GA (95%)?
The output is a one-pager that would fit in a product brief. Goal: be able to pitch any AI product idea to a senior AI PM in 2 minutes using Huyen's vocabulary.
Further reading
- Huyen, Chapter 1 of "AI Engineering" (O'Reilly 2025, paid), "Planning AI Applications" section: https://www.oreilly.com/library/view/ai-engineering/9781098166298/
- Apple, "Human Interface Guidelines for Machine Learning" - deep on human-AI interaction patterns: https://developer.apple.com/design/human-interface-guidelines/machine-learning
- Andrew Ng, "AI for Everyone" (Coursera) - non-technical framing that complements Huyen: https://www.coursera.org/learn/ai-for-everyone
- Eloundou et al., "GPTs are GPTs" (OpenAI, 2023) - industry exposure analysis: https://arxiv.org/abs/2303.10130
- a16z, Enterprise AI playbooks (annual) - defensibility case studies: https://a16z.com/ai/