What is an AI Agent?

Watch or read first

Daily Dose DS, "What is an AI Agent?" and "Agent vs LLM vs RAG" in the AI Engineering Guidebook (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
Anthropic, "Building effective agents" (2024): https://www.anthropic.com/research/building-effective-agents
Andrew Ng, "Agentic AI" series (DeepLearning.AI, 2024): https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/

TL;DR

An AI agent is an LLM-powered system that can autonomously perceive, reason, plan, use tools, and act to reach a goal, adjusting its strategy based on intermediate results. LLM = brain. RAG = brain with reference library. Agent = brain + reference library + hands + decision loop.

The historical problem

A raw LLM is a text-in, text-out black box. It does not:

Access the live web
Call APIs or your databases
Run code
Remember across sessions
Decide "let me try another approach if this failed"

Users still wanted those capabilities. In 2022-2023 they worked around it by manually orchestrating: "generate a summary, then I'll find sources, then I'll ask you to rewrite". Every step needed a human in the loop.

Agents automate that loop. The LLM becomes the decision-maker inside a structured system that can:

Invoke tools (search, SQL, code, APIs)
Observe results
Decide the next action
Iterate until the goal is reached
Ask for human input if stuck

How it works

The core agent loop (ReAct-style)

Goal arrives
  |
  v
[Think]: agent reasons about what to do next
  |
  v
[Act]:   agent calls a tool
  |
  v
[Observe]: tool returns a result
  |
  v
Is goal reached?
  |        \
  no        yes
  |          \
  back to Think    return answer

See react pattern for the deep dive and from-scratch implementation.

Agent vs LLM vs RAG (Daily Dose DS analogy)

LLM is the brain. RAG is feeding that brain with fresh information. An agent is the decision-maker that plans and acts using the brain and the tools.

LLM:
  Text in --> Model --> Text out
  Static knowledge (training data), no tools, no loop.

RAG:
  Text in --> [retrieve relevant docs] --> Model --> Text out
  Same model, richer context per call. Still one shot.

Agent:
  Goal in --> Loop:
                plan --> tool call --> observe --> re-plan
                until goal reached --> Text out
  Model + RAG + tools + planning + memory.

Example: research agent

A "Research Agent" example from Daily Dose DS:

Research Agent searches arXiv, Semantic Scholar, Google Scholar
Filtering Agent picks top papers by citation, date, keywords
Summarization Agent condenses into key insights
Formatting Agent structures the final report

Each agent has a focused role. They collaborate (multi-agent). The final output is a structured research report, end-to-end, without a human in every iteration.

Formal definition

AI agents are autonomous systems that can reason, think, plan, figure out relevant sources, extract information, take actions, and self-correct when something goes wrong.

Four properties distinguish an agent from a plain LLM call:

Autonomy: it decides the next step, within guardrails.
Tools: it can act on the world (read, write, compute).
Loop: it iterates until the goal is reached.
Self-correction: it recognizes failure and adapts.

Levels of agent autonomy (Daily Dose DS)

Covered in full in agent levels and deployment. Summary:

Level	Who controls flow	Who makes decisions
1. Basic responder	Human	Human, LLM just answers
2. Router	Human (defines paths)	LLM picks which path
3. Tool calling	Human (defines tools)	LLM picks when and how
4. Multi-agent	Manager agent	LLM orchestrates sub-agents
5. Autonomous	Nobody	LLM writes and runs code

Most production systems in 2026 are at levels 3-4. Level 5 exists in research and advanced products (Devin, Replit Agent, Cursor Composer) but still carries risk.

Relevance today (2026)

Agents are no longer a demo

In 2022, "AI agents" meant AutoGPT running in circles and burning tokens. In 2026:

Cursor and Claude Code ship production-grade coding agents
Perplexity does agentic search at scale
Replit Agent builds full apps from a spec
Devin (Cognition), Manus, Claude Agent SDK push multi-hour autonomous tasks
Enterprise tools (Glean, Harvey, Hebbia) embed agents in workflows

What changed

Tool calling went mainstream. OpenAI, Anthropic, Gemini all support function calling with reliable JSON. Agents finally have hands.
MCP (Model Context Protocol, Anthropic 2024-11) standardized how agents connect to tools. See [[../06-mcp/README]] and agent protocols.
Reasoning models (o1, R1, Opus 4.5 thinking) dramatically improved planning capability.
Long context (1M tokens) lets agents hold more state.
Cheaper inference. Price/1M tokens dropped 10x in 2 years. Agent loops became affordable.
Prompt caching (Anthropic, OpenAI, Google) made long system prompts for agents economical.

What did not change

Agents still hallucinate.
Agents still loop if unbounded.
Agents still need good evals.
Agents still need observability (LangSmith, Langfuse, Arize).

Where "agent" is overhyped in 2026

Many products labeled "agent" in 2026 are just LLMs with a single tool call. That is not an agent, it is a function-calling app. The Daily Dose DS 5-level scale is helpful: if your system is at level 2 or below, calling it an "agent" is marketing.

The agent vs workflow question

Anthropic's "Building effective agents" (2024) draws a sharp line:

Workflow: LLM calls orchestrated by predefined code.
Agent: LLM decides the flow itself.

Workflows are more predictable, cheaper, easier to debug. Agents are more flexible. For most business logic, workflows win. Save agent loops for genuinely open-ended tasks.

Critical questions

Is ChatGPT an agent? (With web search and code interpreter, roughly level 3-4. The core model alone is not.)
Why is an agent better than a well-prompted LLM? (Only when the task truly needs iteration and tools. For simple Q&A, a plain LLM is faster and cheaper.)
Can you build an agent without a framework? (Yes, see the ReAct from-scratch in react pattern. A few hundred lines of Python.)
How do you stop an agent from looping? (Max iterations, token budget, confidence thresholds, human-in-the-loop gates.)
When does a multi-agent system beat a single-agent system? (When tasks are genuinely parallelizable or need distinct expertise. Not by default.)
How is agentic RAG different from "agent"? (Agentic RAG is an agent specialized on retrieval. An agent is the general case.)

Production pitfalls

Calling it an agent when it is not. Adds confusion and expectations. Be honest about level (1-5).
No loop bounds. Agents with unlimited iterations burn money and time. Always cap.
Zero observability. Agents fail silently or in weird ways. Without traces (LangSmith, Langfuse, Arize, Helicone) you cannot debug.
Single monolithic prompt. Trying to fit "you are X, you can use Y, here are Z rules" in one prompt explodes when you add tools. Modularize.
Latency gaps. Agent loops average 3-30 seconds per task. Bad for interactive chat unless you stream thoughts.
Security. Agents with tool access are attack surfaces. Prompt injection through retrieved content can hijack actions. Guardrails matter. See [[../12-safety-guardrails/README]].
Cost spikes. One complex agent task can cost $1-$10 in tokens. Monitor per-task cost.
Over-automation. Full autonomy is often worse than human-in-the-loop for trust and quality.

Alternatives / Comparisons

Pattern	Complexity	When
Plain LLM call	None	One-shot Q&A
LLM + structured output	Low	Extraction, classification
LLM + RAG	Low-medium	Knowledge grounding
LLM + tool calling (1 tool)	Medium	Simple automation
Workflow (code-orchestrated)	Medium	Known procedure, high reliability
Agent (ReAct-style)	High	Open-ended tasks
Multi-agent	Very high	Specialized roles, parallelism
Autonomous agent (level 5)	Very high + risk	Research, code generation

Default for 2026: prefer the lowest level of autonomy that solves the problem.

Mental parallels (non-AI)

Employee levels:
- Level 1 (basic responder): intern who answers exactly what you ask
- Level 2 (router): receptionist routing calls to the right department
- Level 3 (tool calling): analyst using Excel, SQL, web when needed
- Level 4 (multi-agent): team lead delegating to specialists
- Level 5 (autonomous): senior engineer who writes their own code end-to-end
Robot vacuum cleaners: Roomba = agent. Senses (dirt, obstacles), plans (path), acts (move, suck), corrects (bounces off walls). Simple, but structurally an agent.
Air traffic controller + planes: the ATC is a multi-agent orchestrator. Each pilot is an agent with their own goals and tools. Communication protocols (radio) = agent protocols.

Mini-lab

labs/first-agent/ (to create):

Build a ReAct agent from scratch (no framework) with two tools: web_search (use Tavily API) and calculator.
Give it a goal: "What is the current market cap of Apple divided by the population of France?"
Log every thought, action, observation.
Instrument with Langfuse.
Compare with:
- Plain LLM (no tools)
- Plain LLM + RAG over Wikipedia dumps
- Your agent

Stack: uv, anthropic, tavily-python, langfuse. ~200 lines.

What is an AI Agent?

What is an AI Agent?

Watch or read first

TL;DR

The historical problem

How it works

The core agent loop (ReAct-style)

Agent vs LLM vs RAG (Daily Dose DS analogy)

Example: research agent

Formal definition

Levels of agent autonomy (Daily Dose DS)

Relevance today (2026)

Agents are no longer a demo

What changed

What did not change

Where "agent" is overhyped in 2026

The agent vs workflow question

Critical questions

Production pitfalls

Alternatives / Comparisons

Mental parallels (non-AI)

Mini-lab

Further reading

Canonical

Related in this KB

Frameworks