Agentic RAG

Watch or read first

Daily Dose DS, "RAG vs Agentic RAG" in the AI Engineering Guidebook (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
LlamaIndex Agentic RAG cookbook: https://docs.llamaindex.ai/en/stable/examples/agent/
LangChain, "Agentic RAG with LangGraph": https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_agentic_rag/

TL;DR

Agentic RAG replaces the fixed "retrieve once, generate once" pipeline with an agent that decides WHEN to retrieve, FROM WHICH source, HOW MANY TIMES, and whether the answer is good enough. It turns RAG from a static pipeline into a reasoning loop, paying compute for better accuracy on complex queries.

The historical problem

Traditional RAG has three rigid weaknesses:

Retrieve once, generate once. If the first retrieval misses, the LLM has no way to fix it. The answer is polished but wrong.
No reasoning over retrieval strategy. "Who was the CEO of the company that acquired Instagram?" needs: find Instagram acquirer (Facebook), find Facebook CEO (Zuckerberg at the time). Naive RAG retrieves both chunks in one vector search and gets confused.
No adaptability. The pipeline is the same whether the query is trivial or hard. No escalation, no source selection, no self-check.

Agentic RAG fixes all three by giving an agent control over the retrieval process.

How it works

The agentic loop (Daily Dose DS workflow)

Step 1-2) User query arrives. An agent REWRITES it:
            - fix typos
            - clarify ambiguity
            - reformulate for better embedding

Step 3)   Another agent DECIDES whether retrieval is needed.
            - trivial or chitchat: skip retrieval

Step 4)   If not needed, send to LLM directly.

Step 5-8) If needed, an agent picks the SOURCE:
            - vector DB
            - SQL database
            - tool or API
            - web search
          Retrieve context.

Step 9)   LLM generates a response.

Step 10)  A final CHECKER agent validates the answer against
          the context and query.

Step 11)  If OK, return.

Step 12)  If not OK, loop back to Step 1 with a refined query
          or different source. Stop after N iterations.

This is one blueprint. You can collapse agents into one ReAct loop, or split them further.

Typical implementation with LangGraph

# Pseudocode
graph = StateGraph(AgentState)
graph.add_node("rewrite", rewrite_query)
graph.add_node("decide_retrieval", classify_need_for_retrieval)
graph.add_node("pick_source", choose_best_source)
graph.add_node("retrieve", retrieve_from_source)
graph.add_node("generate", generate_answer)
graph.add_node("check", validate_answer)

graph.add_conditional_edges("decide_retrieval", need_retrieval_fn, {
    True: "pick_source",
    False: "generate"
})
graph.add_conditional_edges("check", answer_ok_fn, {
    True: END,
    False: "rewrite"
})

Relation to the ReAct pattern

Agentic RAG is a specialization of the [[../05-ai-agents/react-pattern|ReAct]] pattern where the main tool is "retrieve". Thought -> Action (retrieve or re-retrieve) -> Observation (chunks) -> Thought (is this enough?) -> Action or Answer.

Tools the agent can use

Vector DB search (dense)
BM25 search (sparse)
Hybrid search
SQL query
Web search (Tavily, Exa, Perplexity API)
Knowledge graph traversal
Another agent (multi-agent RAG)
Code interpreter

Each tool is exposed as a function the agent can call via function calling or MCP.

Relevance today (2026)

Agentic RAG is the new default for complex queries

Daily Dose DS positions this as the "evolution after RAG". By 2026 it is not just evolution, it is standard. Every serious product (Perplexity, Claude search, ChatGPT with browsing, Glean, Hebbia) runs some form of agentic RAG.

Reasoning models made it cheap

o1-mini, Claude Haiku 4.5 thinking, Gemini 2.5 Flash Thinking. Cheap, fast reasoning means the planning and checking steps cost pennies. In 2022 the same loop would have cost 10x more.

Multi-source is the real unlock

The value of Agentic RAG is not "retry if wrong". It is "pick the right source". When your knowledge spans vector DB + SQL + APIs + web, only an agent can route queries correctly per call.

The memory bridge

Agentic RAG is read-only. The next step is Agent Memory: read AND write to external knowledge. Agents that remember past interactions, user preferences, long-term facts. See agent memory.

Daily Dose DS makes this clear:

RAG (2020-2023): read-only, one-shot
Agentic RAG: read-only via tool calls
Agent Memory: read-write via tool calls

2026 trend: Agentic RAG + Memory together are becoming the standard for production assistants.

Costs and pitfalls got sharper

Agent loops without bounds = runaway costs. Every 2026 agentic RAG deployment has:

Max iterations (e.g., 3-5)
Per-step token budget
Early stopping on high confidence
Observability (LangSmith, Langfuse, Arize) to monitor loops

Critical questions

When is Agentic RAG overkill? (Simple FAQ. The query is "what is X" and the corpus is homogeneous. Naive RAG + rerank is faster and cheaper.)
How do you stop infinite loops? (Hard cap on iterations, confidence threshold, human fallback.)
The checker agent marks answers wrong. Who checks the checker? (Evals. Build a golden set. Measure the checker's false positive/negative rate.)
Can a smaller LLM drive the agent? (Yes. Use Haiku for planning, Sonnet or Opus for final synthesis. Cost-optimized.)
Does Agentic RAG replace Reranking? (No. They are layered. The agent decides what to retrieve; the reranker refines what came back.)
How do you debug when the agent picks the wrong source? (Traces. Langfuse or LangSmith show the decision tree. You see the classifier's reasoning.)

Production pitfalls

Unbounded loops. Query "What is the weather?". Agent retrieves, checker rejects, agent retries, forever. Always cap.
Cost blowout on hard queries. A single complex query runs 10 retrievals + 10 LLM calls. Per-query SLA blown. Monitor and alert.
Source picker biases. Agent always picks vector DB even when SQL would be faster. Train the classifier on real traffic.
Checker hallucinates "OK". Checker with weak reasoning approves bad answers. Eval the checker separately.
Latency. Multi-step agents can easily cross 5-10 seconds. For chat, stream intermediate thoughts or pre-compute common paths.
No tracing. You will not debug this without observability. Use LangSmith, Langfuse, Arize, Helicone, or home-rolled.
Schema drift. Each source has different output shapes. Normalize before feeding to the checker.

Alternatives / Comparisons

Pattern	When	Trade-off
Naive RAG	Simple Q&A, homogeneous corpus	Fast, cheap, low accuracy on complex queries
RAG + rerank	Most production Q&A	Best of simple RAG
Adaptive RAG (classifier routes)	Mixed query types, no tools	Lighter than agent
Agentic RAG	Multi-source, multi-step, tool use	Expensive, most flexible
Agentic RAG + Memory	Long-term user context	Full power, ops-heavy
Just use a huge context	<100k tokens corpus, 1M model	Expensive per query, low recall

Mental parallels (non-AI)

Investigative journalist: doesn't query Google once. Rewrites the question, checks multiple sources, cross-references, verifies. Returns to sources if the story doesn't fit. Then publishes.
Medical differential diagnosis: a doctor considers symptoms, runs tests (retrievals), rules out hypotheses, orders more tests if needed, converges on a diagnosis.
Research PhD workflow: question -> lit review (retrieval) -> preliminary hypothesis -> more targeted search -> experiments -> revise -> publish.
Customer support Tier 2: unlike Tier 1 (script-following = Naive RAG), Tier 2 investigates across systems, asks clarifying questions, escalates if stuck.

Mini-lab

labs/agentic-rag/ (to create):

Build an Agentic RAG with LangGraph over two sources:
- A vector DB with company docs
- A SQL database with live sales numbers
Define agents: query rewriter, source picker, retriever, answer checker.
Test with 3 query types:
- "What is our refund policy?" (docs)
- "How much did we sell last month?" (SQL)
- "What was the revenue impact of the new refund policy?" (both)
Instrument with Langfuse.
Compare to a Naive RAG baseline on the same queries.

Stack: uv, langchain, langgraph, qdrant, sqlite, langfuse, anthropic.

Agentic RAG

Agentic RAG

Watch or read first

TL;DR

The historical problem

How it works

The agentic loop (Daily Dose DS workflow)

Typical implementation with LangGraph

Relation to the ReAct pattern

Tools the agent can use

Relevance today (2026)

Agentic RAG is the new default for complex queries

Reasoning models made it cheap

Multi-source is the real unlock

The memory bridge

Costs and pitfalls got sharper

Critical questions

Production pitfalls

Alternatives / Comparisons

Mental parallels (non-AI)

Mini-lab

Further reading

Canonical

Related in this KB

Tools