Foundations
00·Foundations·updated 2026-04-21

Planning AI Applications

Before building an AI application, answer three questions: why should it exist, what role does AI play vs humans, and what milestone gets you from demo to production? It is easy to build a cool demo with foundation models. It is hard to create a profitable product.

Planning AI Applications

TL;DR

Before building an AI application, answer three questions: why should it exist, what role does AI play vs humans, and what milestone gets you from demo to production? It is easy to build a cool demo with foundation models. It is hard to create a profitable product.

The historical problem

In 2023-2024, most companies felt FOMO. "We need AI" became a directive. Teams were told to "integrate AI" without a clear use case. The result was thousands of demos, most of which never reached production.

Huyen saw this across the industry and wrote Chapter 1 to give AI engineers a framework to answer the uncomfortable question: should this app even exist?

How it works

Step 1: Use case evaluation (why build it)

Three levels of motivation, from urgent to speculative:

  1. Existential risk: if we do not build AI, competitors will kill us. Common in document processing (insurance, finance), creative work (advertising, design), and information-heavy industries. 2023 Gartner: 7% of AI adopters cited business continuity. Reference: OpenAI's "GPTs are GPTs" (Eloundou et al., 2023) ranks industry exposure.
  2. Opportunity: AI boosts profit or productivity. Customer support, sales lead generation, content creation, internal knowledge search.
  3. Strategic learning: not sure where AI fits yet, but do not want to be left behind. R&D budget, optional.

If motivation is 1, build in-house. If motivation is 2 or 3, buy first, build only if buy fails.

Step 2: The role of AI and humans

Three axes to classify the AI feature:

Critical vs complementary

  • Critical: app cannot work without AI (Face ID, DALL-E, Cursor autocomplete).
  • Complementary: app works without AI, AI enhances it (Gmail Smart Compose, Google Maps traffic prediction).

Rule: the more critical AI is, the higher the reliability bar. Users forgive a wrong suggestion in Smart Compose. They do not forgive a failed Face ID unlock.

Reactive vs proactive

  • Reactive: AI responds to user action (chatbot, search, completion).
  • Proactive: AI acts without being asked (traffic alerts, recommendations, scheduled summaries).

Reactive needs low latency (users wait). Proactive needs high quality (users did not ask, so mistakes feel intrusive).

Dynamic vs static

  • Static: model updates rarely, one model per user segment (default ChatGPT for everyone).
  • Dynamic: model adapts continuously per user (Face ID updates as your face ages, ChatGPT's memory feature, personalized fine-tunes).

Dynamic is harder. You need per-user state, drift detection, and privacy guarantees.

Step 3: AI product defensibility

Huyen asks: if your app is just a GPT wrapper, why should you exist?

Three common moats:

  1. Proprietary data - unique training data or retrieval corpus (medical records, legal history)
  2. Distribution - owning the user (Microsoft integrates Copilot into Office, Salesforce into CRM)
  3. Workflow and UX - deep integration into a specific job (Cursor for devs, Harvey for lawyers)

"Wrapper" apps without moat got replicated by Microsoft in two weeks in 2023. Defensibility is a must.

Step 4: Setting expectations and milestones

Demo-to-production is the AI engineering graveyard. Huyen's advice:

  • Quality bar depends on criticality: medical diagnosis needs 99.99%, content generation can ship at 80%.
  • Set user-facing expectations: tell users the system is AI-powered and can be wrong. Confidence indicators, escape hatches to humans.
  • Iterative milestones: plan an alpha (5% works), beta (70% works), GA (95% works) path. Plan for the evaluation framework BEFORE shipping alpha, not after.

Relevance today (2026)

Huyen's framework from 2024 holds up well. Adjustments for 2026:

  • Defensibility pressure is higher. In 2024, a GPT-4 wrapper could get VC funding. In 2026, wrappers get cloned in a weekend by someone with Cursor. Defensibility is not optional.
  • Dynamic features easier to build. Vector DBs + per-user memory (ChatGPT Memory, Claude Projects) are now cheap. Huyen's "dynamic is hard" is less true in 2026.
  • "Proactive AI" is hot. Agent-based apps (scheduled summaries, inbox triage, workflow automation) exploded in 2025-2026 with MCP. Huyen's proactive category deserves more weight today.
  • Quality bar shifted up. Users experienced Claude Opus 4.x and GPT-5 in ChatGPT. They will not tolerate a 70%-works product in 2026 unless the value is extraordinary.
  • Regulatory pressure. EU AI Act (2024-2025 enforcement), US state-level AI laws. Planning must now include: is this a high-risk AI system under the regulation? What compliance does it trigger?

Question: in 2026, is "should we build this?" the wrong question? Maybe it is "should we buy, integrate, or build?". Most answers are "integrate existing SaaS, do not build from scratch".

Critical questions

  • If your use case is "opportunity" (not existential), and a SaaS already exists (e.g., Zendesk AI for support), what justifies building your own?
  • What is your quality bar? What would cause a user to churn after a bad AI response?
  • Who is responsible when the AI fails? Do you have a human-in-the-loop fallback?
  • How will you measure success? Business metric (conversion, NPS, retention) or AI-specific metric (accuracy, helpfulness)?
  • What is your cost per successful interaction? Can you sustain it at 10x your current scale?

Production pitfalls

  • Skipping the "why" question. Teams get a leadership mandate and skip use-case evaluation. Result: a showpiece demo with no business value.
  • Mistaking novelty for moat. First-mover advantage in AI is 2-6 weeks before a competitor clones you. Only durable moats (data, distribution, workflow) survive.
  • Over-promising accuracy. Telling users "AI can do X" when it does X 70% of the time. Better: "AI suggests X, you confirm" (human-in-the-loop framing).
  • Missing the eval. Without an eval set, you cannot tell if your prompt change helped or hurt. See 08-evaluations/.
  • Static when you needed dynamic. One model for all users, serving wildly different use cases, gets mediocre across the board. Consider personalization early.

Alternatives / Comparisons

Alternatives to building a custom AI app:

OptionWhen to preferDownside
Use an existing SaaS with AI (Notion AI, Intercom Fin)Off-the-shelf fits your workflowLimited customization, shared moat
Embed Claude/GPT API into existing productYour product is the moat, AI is a featureVendor dependency
Fine-tune a smaller model for your taskPrivacy, cost, scale requires itOps burden, slower iteration
Build a full vertical AI productYou have proprietary data or workflow insight6-12 month build, risky

Mini-lab

For your Torah Study AI project, answer Huyen's framework explicitly:

  • Use case motivation: existential, opportunity, or strategic learning?
  • Critical or complementary: does the app work without AI?
  • Reactive or proactive: does it answer when asked or push insights?
  • Static or dynamic: personalized per user or not?
  • Defensibility: proprietary Torah interpretation data? Integration with existing tools (Sefaria)? Unique UX for havruta mode?
  • Milestones: what is alpha / beta / GA quality?

Write a short brief (1-2 pages) in outputs/reports/torah-study-huyen-brief.md. Goal: you can pitch the project to a senior AI product manager in 2 minutes using his vocabulary.

Further reading

  • Huyen, Chapter 1 of AI Engineering, "Planning AI Applications" section
  • Apple, "Human Interface Guidelines for Machine Learning" - deep on human-AI interaction patterns
  • Andrew Ng, "AI for Everyone" (Coursera) - non-technical framing that complements Huyen
  • Eloundou et al., "GPTs are GPTs" (OpenAI, 2023) - industry exposure analysis
  • a16z, Enterprise AI playbooks (annual) - defensibility case studies
planningproductuse-casedecision-makingmilestones