Planning AI Applications

TL;DR

Before building an AI application, answer three questions: why should it exist, what role does AI play vs humans, and what milestone gets you from demo to production? It is easy to build a cool demo with foundation models. It is hard to create a profitable product.

The historical problem

In 2023-2024, most companies felt FOMO. "We need AI" became a directive. Teams were told to "integrate AI" without a clear use case. The result was thousands of demos, most of which never reached production.

Huyen saw this across the industry and wrote Chapter 1 to give AI engineers a framework to answer the uncomfortable question: should this app even exist?

How it works

Step 1: Use case evaluation (why build it)

Three levels of motivation, from urgent to speculative:

Existential risk: if we do not build AI, competitors will kill us. Common in document processing (insurance, finance), creative work (advertising, design), and information-heavy industries. 2023 Gartner: 7% of AI adopters cited business continuity. Reference: OpenAI's "GPTs are GPTs" (Eloundou et al., 2023) ranks industry exposure.
Opportunity: AI boosts profit or productivity. Customer support, sales lead generation, content creation, internal knowledge search.
Strategic learning: not sure where AI fits yet, but do not want to be left behind. R&D budget, optional.

If motivation is 1, build in-house. If motivation is 2 or 3, buy first, build only if buy fails.

Step 2: The role of AI and humans

Three axes to classify the AI feature:

Critical vs complementary

Critical: app cannot work without AI (Face ID, DALL-E, Cursor autocomplete).
Complementary: app works without AI, AI enhances it (Gmail Smart Compose, Google Maps traffic prediction).

Rule: the more critical AI is, the higher the reliability bar. Users forgive a wrong suggestion in Smart Compose. They do not forgive a failed Face ID unlock.

Reactive vs proactive

Reactive: AI responds to user action (chatbot, search, completion).
Proactive: AI acts without being asked (traffic alerts, recommendations, scheduled summaries).

Reactive needs low latency (users wait). Proactive needs high quality (users did not ask, so mistakes feel intrusive).

Dynamic vs static

Static: model updates rarely, one model per user segment (default ChatGPT for everyone).
Dynamic: model adapts continuously per user (Face ID updates as your face ages, ChatGPT's memory feature, personalized fine-tunes).

Dynamic is harder. You need per-user state, drift detection, and privacy guarantees.

Step 3: AI product defensibility

Huyen asks: if your app is just a GPT wrapper, why should you exist?

Three common moats:

Proprietary data - unique training data or retrieval corpus (medical records, legal history)
Distribution - owning the user (Microsoft integrates Copilot into Office, Salesforce into CRM)
Workflow and UX - deep integration into a specific job (Cursor for devs, Harvey for lawyers)

"Wrapper" apps without moat got replicated by Microsoft in two weeks in 2023. Defensibility is a must.

Step 4: Setting expectations and milestones

Demo-to-production is the AI engineering graveyard. Huyen's advice:

Quality bar depends on criticality: medical diagnosis needs 99.99%, content generation can ship at 80%.
Set user-facing expectations: tell users the system is AI-powered and can be wrong. Confidence indicators, escape hatches to humans.
Iterative milestones: plan an alpha (5% works), beta (70% works), GA (95% works) path. Plan for the evaluation framework BEFORE shipping alpha, not after.

Relevance today (2026)

Huyen's framework from 2024 holds up well. Adjustments for 2026:

Defensibility pressure is higher. In 2024, a GPT-4 wrapper could get VC funding. In 2026, wrappers get cloned in a weekend by someone with Cursor. Defensibility is not optional.
Dynamic features easier to build. Vector DBs + per-user memory (ChatGPT Memory, Claude Projects) are now cheap. Huyen's "dynamic is hard" is less true in 2026.
"Proactive AI" is hot. Agent-based apps (scheduled summaries, inbox triage, workflow automation) exploded in 2025-2026 with MCP. Huyen's proactive category deserves more weight today.
Quality bar shifted up. Users experienced Claude Opus 4.x and GPT-5 in ChatGPT. They will not tolerate a 70%-works product in 2026 unless the value is extraordinary.
Regulatory pressure. EU AI Act (2024-2025 enforcement), US state-level AI laws. Planning must now include: is this a high-risk AI system under the regulation? What compliance does it trigger?

Question: in 2026, is "should we build this?" the wrong question? Maybe it is "should we buy, integrate, or build?". Most answers are "integrate existing SaaS, do not build from scratch".

Critical questions

If your use case is "opportunity" (not existential), and a SaaS already exists (e.g., Zendesk AI for support), what justifies building your own?
What is your quality bar? What would cause a user to churn after a bad AI response?
Who is responsible when the AI fails? Do you have a human-in-the-loop fallback?
How will you measure success? Business metric (conversion, NPS, retention) or AI-specific metric (accuracy, helpfulness)?
What is your cost per successful interaction? Can you sustain it at 10x your current scale?

Production pitfalls

Skipping the "why" question. Teams get a leadership mandate and skip use-case evaluation. Result: a showpiece demo with no business value.
Mistaking novelty for moat. First-mover advantage in AI is 2-6 weeks before a competitor clones you. Only durable moats (data, distribution, workflow) survive.
Over-promising accuracy. Telling users "AI can do X" when it does X 70% of the time. Better: "AI suggests X, you confirm" (human-in-the-loop framing).
Missing the eval. Without an eval set, you cannot tell if your prompt change helped or hurt. See 08-evaluations/.
Static when you needed dynamic. One model for all users, serving wildly different use cases, gets mediocre across the board. Consider personalization early.

Alternatives / Comparisons

Alternatives to building a custom AI app:

Option	When to prefer	Downside
Use an existing SaaS with AI (Notion AI, Intercom Fin)	Off-the-shelf fits your workflow	Limited customization, shared moat
Embed Claude/GPT API into existing product	Your product is the moat, AI is a feature	Vendor dependency
Fine-tune a smaller model for your task	Privacy, cost, scale requires it	Ops burden, slower iteration
Build a full vertical AI product	You have proprietary data or workflow insight	6-12 month build, risky

Mini-lab

Pick a real AI product (existing or an idea you would like to build) and answer Huyen's framework in writing. For example: a customer support chatbot, a legal document summarizer, a language tutor, a code reviewer.

For the product you picked, answer each in 1-2 sentences:

Use case motivation: existential threat, profit/productivity opportunity, or strategic learning?
Critical or complementary: does the app still work without the AI piece?
Reactive or proactive: does the AI respond to a user action, or push suggestions unprompted?
Static or dynamic: same model for everyone, or personalized per user over time?
Defensibility: what is the moat - proprietary data, distribution, workflow integration, or none?
Milestones: what quality bar for alpha (5% useful), beta (70%), GA (95%)?

The output is a one-pager that would fit in a product brief. Goal: be able to pitch any AI product idea to a senior AI PM in 2 minutes using Huyen's vocabulary.

Planning AI Applications

Planning AI Applications

TL;DR

The historical problem

How it works

Step 1: Use case evaluation (why build it)

Step 2: The role of AI and humans

Critical vs complementary

Reactive vs proactive

Dynamic vs static

Step 3: AI product defensibility

Step 4: Setting expectations and milestones

Relevance today (2026)

Critical questions

Production pitfalls

Alternatives / Comparisons

Mini-lab

Further reading