What is an AI Agent?
An AI agent is an LLM-powered system that can autonomously **perceive, reason, plan, use tools, and act** to reach a goal, adjusting its strategy based on intermediate results. LLM = brain. RAG = brain with reference library. Agent = brain + reference library + hands + decision loop.
What is an AI Agent?
Watch or read first
- Daily Dose DS, "What is an AI Agent?" and "Agent vs LLM vs RAG" in the AI Engineering Guidebook (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
- Anthropic, "Building effective agents" (2024): https://www.anthropic.com/research/building-effective-agents
- Andrew Ng, "Agentic AI" series (DeepLearning.AI, 2024): https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/
TL;DR
An AI agent is an LLM-powered system that can autonomously perceive, reason, plan, use tools, and act to reach a goal, adjusting its strategy based on intermediate results. LLM = brain. RAG = brain with reference library. Agent = brain + reference library + hands + decision loop.
The historical problem
A raw LLM is a text-in, text-out black box. It does not:
- Access the live web
- Call APIs or your databases
- Run code
- Remember across sessions
- Decide "let me try another approach if this failed"
Users still wanted those capabilities. In 2022-2023 they worked around it by manually orchestrating: "generate a summary, then I'll find sources, then I'll ask you to rewrite". Every step needed a human in the loop.
Agents automate that loop. The LLM becomes the decision-maker inside a structured system that can:
- Invoke tools (search, SQL, code, APIs)
- Observe results
- Decide the next action
- Iterate until the goal is reached
- Ask for human input if stuck
How it works
The core agent loop (ReAct-style)
Goal arrives
|
v
[Think]: agent reasons about what to do next
|
v
[Act]: agent calls a tool
|
v
[Observe]: tool returns a result
|
v
Is goal reached?
| \
no yes
| \
back to Think return answer
See react pattern for the deep dive and from-scratch implementation.
Agent vs LLM vs RAG (Daily Dose DS analogy)
LLM is the brain. RAG is feeding that brain with fresh information. An agent is the decision-maker that plans and acts using the brain and the tools.
LLM:
Text in --> Model --> Text out
Static knowledge (training data), no tools, no loop.
RAG:
Text in --> [retrieve relevant docs] --> Model --> Text out
Same model, richer context per call. Still one shot.
Agent:
Goal in --> Loop:
plan --> tool call --> observe --> re-plan
until goal reached --> Text out
Model + RAG + tools + planning + memory.
Example: research agent
A "Research Agent" example from Daily Dose DS:
- Research Agent searches arXiv, Semantic Scholar, Google Scholar
- Filtering Agent picks top papers by citation, date, keywords
- Summarization Agent condenses into key insights
- Formatting Agent structures the final report
Each agent has a focused role. They collaborate (multi-agent). The final output is a structured research report, end-to-end, without a human in every iteration.
Formal definition
AI agents are autonomous systems that can reason, think, plan, figure out relevant sources, extract information, take actions, and self-correct when something goes wrong.
Four properties distinguish an agent from a plain LLM call:
- Autonomy: it decides the next step, within guardrails.
- Tools: it can act on the world (read, write, compute).
- Loop: it iterates until the goal is reached.
- Self-correction: it recognizes failure and adapts.
Levels of agent autonomy (Daily Dose DS)
Covered in full in agent levels and deployment. Summary:
| Level | Who controls flow | Who makes decisions |
|---|---|---|
| 1. Basic responder | Human | Human, LLM just answers |
| 2. Router | Human (defines paths) | LLM picks which path |
| 3. Tool calling | Human (defines tools) | LLM picks when and how |
| 4. Multi-agent | Manager agent | LLM orchestrates sub-agents |
| 5. Autonomous | Nobody | LLM writes and runs code |
Most production systems in 2026 are at levels 3-4. Level 5 exists in research and advanced products (Devin, Replit Agent, Cursor Composer) but still carries risk.
Relevance today (2026)
Agents are no longer a demo
In 2022, "AI agents" meant AutoGPT running in circles and burning tokens. In 2026:
- Cursor and Claude Code ship production-grade coding agents
- Perplexity does agentic search at scale
- Replit Agent builds full apps from a spec
- Devin (Cognition), Manus, Claude Agent SDK push multi-hour autonomous tasks
- Enterprise tools (Glean, Harvey, Hebbia) embed agents in workflows
What changed
- Tool calling went mainstream. OpenAI, Anthropic, Gemini all support function calling with reliable JSON. Agents finally have hands.
- MCP (Model Context Protocol, Anthropic 2024-11) standardized how agents connect to tools. See [[../06-mcp/README]] and agent protocols.
- Reasoning models (o1, R1, Opus 4.5 thinking) dramatically improved planning capability.
- Long context (1M tokens) lets agents hold more state.
- Cheaper inference. Price/1M tokens dropped 10x in 2 years. Agent loops became affordable.
- Prompt caching (Anthropic, OpenAI, Google) made long system prompts for agents economical.
What did not change
- Agents still hallucinate.
- Agents still loop if unbounded.
- Agents still need good evals.
- Agents still need observability (LangSmith, Langfuse, Arize).
Where "agent" is overhyped in 2026
Many products labeled "agent" in 2026 are just LLMs with a single tool call. That is not an agent, it is a function-calling app. The Daily Dose DS 5-level scale is helpful: if your system is at level 2 or below, calling it an "agent" is marketing.
The agent vs workflow question
Anthropic's "Building effective agents" (2024) draws a sharp line:
- Workflow: LLM calls orchestrated by predefined code.
- Agent: LLM decides the flow itself.
Workflows are more predictable, cheaper, easier to debug. Agents are more flexible. For most business logic, workflows win. Save agent loops for genuinely open-ended tasks.
Critical questions
- Is ChatGPT an agent? (With web search and code interpreter, roughly level 3-4. The core model alone is not.)
- Why is an agent better than a well-prompted LLM? (Only when the task truly needs iteration and tools. For simple Q&A, a plain LLM is faster and cheaper.)
- Can you build an agent without a framework? (Yes, see the ReAct from-scratch in react pattern. A few hundred lines of Python.)
- How do you stop an agent from looping? (Max iterations, token budget, confidence thresholds, human-in-the-loop gates.)
- When does a multi-agent system beat a single-agent system? (When tasks are genuinely parallelizable or need distinct expertise. Not by default.)
- How is agentic RAG different from "agent"? (Agentic RAG is an agent specialized on retrieval. An agent is the general case.)
Production pitfalls
- Calling it an agent when it is not. Adds confusion and expectations. Be honest about level (1-5).
- No loop bounds. Agents with unlimited iterations burn money and time. Always cap.
- Zero observability. Agents fail silently or in weird ways. Without traces (LangSmith, Langfuse, Arize, Helicone) you cannot debug.
- Single monolithic prompt. Trying to fit "you are X, you can use Y, here are Z rules" in one prompt explodes when you add tools. Modularize.
- Latency gaps. Agent loops average 3-30 seconds per task. Bad for interactive chat unless you stream thoughts.
- Security. Agents with tool access are attack surfaces. Prompt injection through retrieved content can hijack actions. Guardrails matter. See [[../12-safety-guardrails/README]].
- Cost spikes. One complex agent task can cost $1-$10 in tokens. Monitor per-task cost.
- Over-automation. Full autonomy is often worse than human-in-the-loop for trust and quality.
Alternatives / Comparisons
| Pattern | Complexity | When |
|---|---|---|
| Plain LLM call | None | One-shot Q&A |
| LLM + structured output | Low | Extraction, classification |
| LLM + RAG | Low-medium | Knowledge grounding |
| LLM + tool calling (1 tool) | Medium | Simple automation |
| Workflow (code-orchestrated) | Medium | Known procedure, high reliability |
| Agent (ReAct-style) | High | Open-ended tasks |
| Multi-agent | Very high | Specialized roles, parallelism |
| Autonomous agent (level 5) | Very high + risk | Research, code generation |
Default for 2026: prefer the lowest level of autonomy that solves the problem.
Mental parallels (non-AI)
- Employee levels:
- Level 1 (basic responder): intern who answers exactly what you ask
- Level 2 (router): receptionist routing calls to the right department
- Level 3 (tool calling): analyst using Excel, SQL, web when needed
- Level 4 (multi-agent): team lead delegating to specialists
- Level 5 (autonomous): senior engineer who writes their own code end-to-end
- Robot vacuum cleaners: Roomba = agent. Senses (dirt, obstacles), plans (path), acts (move, suck), corrects (bounces off walls). Simple, but structurally an agent.
- Air traffic controller + planes: the ATC is a multi-agent orchestrator. Each pilot is an agent with their own goals and tools. Communication protocols (radio) = agent protocols.
Mini-lab
labs/first-agent/ (to create):
- Build a ReAct agent from scratch (no framework) with two tools:
web_search(use Tavily API) andcalculator. - Give it a goal: "What is the current market cap of Apple divided by the population of France?"
- Log every thought, action, observation.
- Instrument with Langfuse.
- Compare with:
- Plain LLM (no tools)
- Plain LLM + RAG over Wikipedia dumps
- Your agent
Stack: uv, anthropic, tavily-python, langfuse. ~200 lines.
Further reading
Canonical
- Daily Dose DS, "What is an AI Agent?" and "Agent vs LLM vs RAG" (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
- Anthropic, "Building effective agents" (2024) - https://www.anthropic.com/research/building-effective-agents
- Andrew Ng, "Four AI agent design patterns" (2024) on DeepLearning.AI: https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/
- Karpathy's "LLM OS" vision talks on YouTube: https://www.youtube.com/@AndrejKarpathy
Related in this KB
- agent building blocks
- agent levels and deployment
- react pattern
- function calling
- agent memory
- agentic design patterns
- agent protocols
- agentic rag
- [[../06-mcp/README]]
Frameworks
- CrewAI (https://docs.crewai.com/), LangGraph (https://langchain-ai.github.io/langgraph/), LlamaIndex Agents (https://docs.llamaindex.ai/en/stable/understanding/agent/)
- AutoGen (Microsoft): https://github.com/microsoft/autogen
- PydanticAI: https://ai.pydantic.dev/
- OpenAI Agents SDK: https://openai.github.io/openai-agents-python/
- Claude Agent SDK (formerly Claude Code SDK): https://docs.anthropic.com/en/docs/claude-code/sdk