AI Agents
05·AI Agents·updated 2026-04-19

What is an AI Agent?

An AI agent is an LLM-powered system that can autonomously **perceive, reason, plan, use tools, and act** to reach a goal, adjusting its strategy based on intermediate results. LLM = brain. RAG = brain with reference library. Agent = brain + reference library + hands + decision loop.

What is an AI Agent?

Watch or read first

TL;DR

An AI agent is an LLM-powered system that can autonomously perceive, reason, plan, use tools, and act to reach a goal, adjusting its strategy based on intermediate results. LLM = brain. RAG = brain with reference library. Agent = brain + reference library + hands + decision loop.

The historical problem

A raw LLM is a text-in, text-out black box. It does not:

  • Access the live web
  • Call APIs or your databases
  • Run code
  • Remember across sessions
  • Decide "let me try another approach if this failed"

Users still wanted those capabilities. In 2022-2023 they worked around it by manually orchestrating: "generate a summary, then I'll find sources, then I'll ask you to rewrite". Every step needed a human in the loop.

Agents automate that loop. The LLM becomes the decision-maker inside a structured system that can:

  • Invoke tools (search, SQL, code, APIs)
  • Observe results
  • Decide the next action
  • Iterate until the goal is reached
  • Ask for human input if stuck

How it works

The core agent loop (ReAct-style)

Goal arrives
  |
  v
[Think]: agent reasons about what to do next
  |
  v
[Act]:   agent calls a tool
  |
  v
[Observe]: tool returns a result
  |
  v
Is goal reached?
  |        \
  no        yes
  |          \
  back to Think    return answer

See react pattern for the deep dive and from-scratch implementation.

Agent vs LLM vs RAG (Daily Dose DS analogy)

LLM is the brain. RAG is feeding that brain with fresh information. An agent is the decision-maker that plans and acts using the brain and the tools.

LLM:
  Text in --> Model --> Text out
  Static knowledge (training data), no tools, no loop.

RAG:
  Text in --> [retrieve relevant docs] --> Model --> Text out
  Same model, richer context per call. Still one shot.

Agent:
  Goal in --> Loop:
                plan --> tool call --> observe --> re-plan
                until goal reached --> Text out
  Model + RAG + tools + planning + memory.

Example: research agent

A "Research Agent" example from Daily Dose DS:

  • Research Agent searches arXiv, Semantic Scholar, Google Scholar
  • Filtering Agent picks top papers by citation, date, keywords
  • Summarization Agent condenses into key insights
  • Formatting Agent structures the final report

Each agent has a focused role. They collaborate (multi-agent). The final output is a structured research report, end-to-end, without a human in every iteration.

Formal definition

AI agents are autonomous systems that can reason, think, plan, figure out relevant sources, extract information, take actions, and self-correct when something goes wrong.

Four properties distinguish an agent from a plain LLM call:

  1. Autonomy: it decides the next step, within guardrails.
  2. Tools: it can act on the world (read, write, compute).
  3. Loop: it iterates until the goal is reached.
  4. Self-correction: it recognizes failure and adapts.

Levels of agent autonomy (Daily Dose DS)

Covered in full in agent levels and deployment. Summary:

LevelWho controls flowWho makes decisions
1. Basic responderHumanHuman, LLM just answers
2. RouterHuman (defines paths)LLM picks which path
3. Tool callingHuman (defines tools)LLM picks when and how
4. Multi-agentManager agentLLM orchestrates sub-agents
5. AutonomousNobodyLLM writes and runs code

Most production systems in 2026 are at levels 3-4. Level 5 exists in research and advanced products (Devin, Replit Agent, Cursor Composer) but still carries risk.

Relevance today (2026)

Agents are no longer a demo

In 2022, "AI agents" meant AutoGPT running in circles and burning tokens. In 2026:

  • Cursor and Claude Code ship production-grade coding agents
  • Perplexity does agentic search at scale
  • Replit Agent builds full apps from a spec
  • Devin (Cognition), Manus, Claude Agent SDK push multi-hour autonomous tasks
  • Enterprise tools (Glean, Harvey, Hebbia) embed agents in workflows

What changed

  1. Tool calling went mainstream. OpenAI, Anthropic, Gemini all support function calling with reliable JSON. Agents finally have hands.
  2. MCP (Model Context Protocol, Anthropic 2024-11) standardized how agents connect to tools. See [[../06-mcp/README]] and agent protocols.
  3. Reasoning models (o1, R1, Opus 4.5 thinking) dramatically improved planning capability.
  4. Long context (1M tokens) lets agents hold more state.
  5. Cheaper inference. Price/1M tokens dropped 10x in 2 years. Agent loops became affordable.
  6. Prompt caching (Anthropic, OpenAI, Google) made long system prompts for agents economical.

What did not change

  • Agents still hallucinate.
  • Agents still loop if unbounded.
  • Agents still need good evals.
  • Agents still need observability (LangSmith, Langfuse, Arize).

Where "agent" is overhyped in 2026

Many products labeled "agent" in 2026 are just LLMs with a single tool call. That is not an agent, it is a function-calling app. The Daily Dose DS 5-level scale is helpful: if your system is at level 2 or below, calling it an "agent" is marketing.

The agent vs workflow question

Anthropic's "Building effective agents" (2024) draws a sharp line:

  • Workflow: LLM calls orchestrated by predefined code.
  • Agent: LLM decides the flow itself.

Workflows are more predictable, cheaper, easier to debug. Agents are more flexible. For most business logic, workflows win. Save agent loops for genuinely open-ended tasks.

Critical questions

  • Is ChatGPT an agent? (With web search and code interpreter, roughly level 3-4. The core model alone is not.)
  • Why is an agent better than a well-prompted LLM? (Only when the task truly needs iteration and tools. For simple Q&A, a plain LLM is faster and cheaper.)
  • Can you build an agent without a framework? (Yes, see the ReAct from-scratch in react pattern. A few hundred lines of Python.)
  • How do you stop an agent from looping? (Max iterations, token budget, confidence thresholds, human-in-the-loop gates.)
  • When does a multi-agent system beat a single-agent system? (When tasks are genuinely parallelizable or need distinct expertise. Not by default.)
  • How is agentic RAG different from "agent"? (Agentic RAG is an agent specialized on retrieval. An agent is the general case.)

Production pitfalls

  • Calling it an agent when it is not. Adds confusion and expectations. Be honest about level (1-5).
  • No loop bounds. Agents with unlimited iterations burn money and time. Always cap.
  • Zero observability. Agents fail silently or in weird ways. Without traces (LangSmith, Langfuse, Arize, Helicone) you cannot debug.
  • Single monolithic prompt. Trying to fit "you are X, you can use Y, here are Z rules" in one prompt explodes when you add tools. Modularize.
  • Latency gaps. Agent loops average 3-30 seconds per task. Bad for interactive chat unless you stream thoughts.
  • Security. Agents with tool access are attack surfaces. Prompt injection through retrieved content can hijack actions. Guardrails matter. See [[../12-safety-guardrails/README]].
  • Cost spikes. One complex agent task can cost $1-$10 in tokens. Monitor per-task cost.
  • Over-automation. Full autonomy is often worse than human-in-the-loop for trust and quality.

Alternatives / Comparisons

PatternComplexityWhen
Plain LLM callNoneOne-shot Q&A
LLM + structured outputLowExtraction, classification
LLM + RAGLow-mediumKnowledge grounding
LLM + tool calling (1 tool)MediumSimple automation
Workflow (code-orchestrated)MediumKnown procedure, high reliability
Agent (ReAct-style)HighOpen-ended tasks
Multi-agentVery highSpecialized roles, parallelism
Autonomous agent (level 5)Very high + riskResearch, code generation

Default for 2026: prefer the lowest level of autonomy that solves the problem.

Mental parallels (non-AI)

  • Employee levels:
    • Level 1 (basic responder): intern who answers exactly what you ask
    • Level 2 (router): receptionist routing calls to the right department
    • Level 3 (tool calling): analyst using Excel, SQL, web when needed
    • Level 4 (multi-agent): team lead delegating to specialists
    • Level 5 (autonomous): senior engineer who writes their own code end-to-end
  • Robot vacuum cleaners: Roomba = agent. Senses (dirt, obstacles), plans (path), acts (move, suck), corrects (bounces off walls). Simple, but structurally an agent.
  • Air traffic controller + planes: the ATC is a multi-agent orchestrator. Each pilot is an agent with their own goals and tools. Communication protocols (radio) = agent protocols.

Mini-lab

labs/first-agent/ (to create):

  1. Build a ReAct agent from scratch (no framework) with two tools: web_search (use Tavily API) and calculator.
  2. Give it a goal: "What is the current market cap of Apple divided by the population of France?"
  3. Log every thought, action, observation.
  4. Instrument with Langfuse.
  5. Compare with:
    • Plain LLM (no tools)
    • Plain LLM + RAG over Wikipedia dumps
    • Your agent

Stack: uv, anthropic, tavily-python, langfuse. ~200 lines.

Further reading

Canonical

Related in this KB

Frameworks

agentllmragautonomydefinitionlevels