Agentic RAG
Agentic RAG replaces the fixed "retrieve once, generate once" pipeline with an agent that decides WHEN to retrieve, FROM WHICH source, HOW MANY TIMES, and whether the answer is good enough. It turns RAG from a static pipeline into a reasoning loop, paying compute for better accuracy on complex queries.
Agentic RAG
Watch or read first
- Daily Dose DS, "RAG vs Agentic RAG" in the AI Engineering Guidebook (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
- LlamaIndex Agentic RAG cookbook: https://docs.llamaindex.ai/en/stable/examples/agent/
- LangChain, "Agentic RAG with LangGraph": https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_agentic_rag/
TL;DR
Agentic RAG replaces the fixed "retrieve once, generate once" pipeline with an agent that decides WHEN to retrieve, FROM WHICH source, HOW MANY TIMES, and whether the answer is good enough. It turns RAG from a static pipeline into a reasoning loop, paying compute for better accuracy on complex queries.
The historical problem
Traditional RAG has three rigid weaknesses:
- Retrieve once, generate once. If the first retrieval misses, the LLM has no way to fix it. The answer is polished but wrong.
- No reasoning over retrieval strategy. "Who was the CEO of the company that acquired Instagram?" needs: find Instagram acquirer (Facebook), find Facebook CEO (Zuckerberg at the time). Naive RAG retrieves both chunks in one vector search and gets confused.
- No adaptability. The pipeline is the same whether the query is trivial or hard. No escalation, no source selection, no self-check.
Agentic RAG fixes all three by giving an agent control over the retrieval process.
How it works
The agentic loop (Daily Dose DS workflow)
Step 1-2) User query arrives. An agent REWRITES it:
- fix typos
- clarify ambiguity
- reformulate for better embedding
Step 3) Another agent DECIDES whether retrieval is needed.
- trivial or chitchat: skip retrieval
Step 4) If not needed, send to LLM directly.
Step 5-8) If needed, an agent picks the SOURCE:
- vector DB
- SQL database
- tool or API
- web search
Retrieve context.
Step 9) LLM generates a response.
Step 10) A final CHECKER agent validates the answer against
the context and query.
Step 11) If OK, return.
Step 12) If not OK, loop back to Step 1 with a refined query
or different source. Stop after N iterations.
This is one blueprint. You can collapse agents into one ReAct loop, or split them further.
Typical implementation with LangGraph
# Pseudocode
graph = StateGraph(AgentState)
graph.add_node("rewrite", rewrite_query)
graph.add_node("decide_retrieval", classify_need_for_retrieval)
graph.add_node("pick_source", choose_best_source)
graph.add_node("retrieve", retrieve_from_source)
graph.add_node("generate", generate_answer)
graph.add_node("check", validate_answer)
graph.add_conditional_edges("decide_retrieval", need_retrieval_fn, {
True: "pick_source",
False: "generate"
})
graph.add_conditional_edges("check", answer_ok_fn, {
True: END,
False: "rewrite"
})
Relation to the ReAct pattern
Agentic RAG is a specialization of the [[../05-ai-agents/react-pattern|ReAct]] pattern where the main tool is "retrieve". Thought -> Action (retrieve or re-retrieve) -> Observation (chunks) -> Thought (is this enough?) -> Action or Answer.
Tools the agent can use
- Vector DB search (dense)
- BM25 search (sparse)
- Hybrid search
- SQL query
- Web search (Tavily, Exa, Perplexity API)
- Knowledge graph traversal
- Another agent (multi-agent RAG)
- Code interpreter
Each tool is exposed as a function the agent can call via function calling or MCP.
Relevance today (2026)
Agentic RAG is the new default for complex queries
Daily Dose DS positions this as the "evolution after RAG". By 2026 it is not just evolution, it is standard. Every serious product (Perplexity, Claude search, ChatGPT with browsing, Glean, Hebbia) runs some form of agentic RAG.
Reasoning models made it cheap
o1-mini, Claude Haiku 4.5 thinking, Gemini 2.5 Flash Thinking. Cheap, fast reasoning means the planning and checking steps cost pennies. In 2022 the same loop would have cost 10x more.
Multi-source is the real unlock
The value of Agentic RAG is not "retry if wrong". It is "pick the right source". When your knowledge spans vector DB + SQL + APIs + web, only an agent can route queries correctly per call.
The memory bridge
Agentic RAG is read-only. The next step is Agent Memory: read AND write to external knowledge. Agents that remember past interactions, user preferences, long-term facts. See agent memory.
Daily Dose DS makes this clear:
- RAG (2020-2023): read-only, one-shot
- Agentic RAG: read-only via tool calls
- Agent Memory: read-write via tool calls
2026 trend: Agentic RAG + Memory together are becoming the standard for production assistants.
Costs and pitfalls got sharper
Agent loops without bounds = runaway costs. Every 2026 agentic RAG deployment has:
- Max iterations (e.g., 3-5)
- Per-step token budget
- Early stopping on high confidence
- Observability (LangSmith, Langfuse, Arize) to monitor loops
Critical questions
- When is Agentic RAG overkill? (Simple FAQ. The query is "what is X" and the corpus is homogeneous. Naive RAG + rerank is faster and cheaper.)
- How do you stop infinite loops? (Hard cap on iterations, confidence threshold, human fallback.)
- The checker agent marks answers wrong. Who checks the checker? (Evals. Build a golden set. Measure the checker's false positive/negative rate.)
- Can a smaller LLM drive the agent? (Yes. Use Haiku for planning, Sonnet or Opus for final synthesis. Cost-optimized.)
- Does Agentic RAG replace Reranking? (No. They are layered. The agent decides what to retrieve; the reranker refines what came back.)
- How do you debug when the agent picks the wrong source? (Traces. Langfuse or LangSmith show the decision tree. You see the classifier's reasoning.)
Production pitfalls
- Unbounded loops. Query "What is the weather?". Agent retrieves, checker rejects, agent retries, forever. Always cap.
- Cost blowout on hard queries. A single complex query runs 10 retrievals + 10 LLM calls. Per-query SLA blown. Monitor and alert.
- Source picker biases. Agent always picks vector DB even when SQL would be faster. Train the classifier on real traffic.
- Checker hallucinates "OK". Checker with weak reasoning approves bad answers. Eval the checker separately.
- Latency. Multi-step agents can easily cross 5-10 seconds. For chat, stream intermediate thoughts or pre-compute common paths.
- No tracing. You will not debug this without observability. Use LangSmith, Langfuse, Arize, Helicone, or home-rolled.
- Schema drift. Each source has different output shapes. Normalize before feeding to the checker.
Alternatives / Comparisons
| Pattern | When | Trade-off |
|---|---|---|
| Naive RAG | Simple Q&A, homogeneous corpus | Fast, cheap, low accuracy on complex queries |
| RAG + rerank | Most production Q&A | Best of simple RAG |
| Adaptive RAG (classifier routes) | Mixed query types, no tools | Lighter than agent |
| Agentic RAG | Multi-source, multi-step, tool use | Expensive, most flexible |
| Agentic RAG + Memory | Long-term user context | Full power, ops-heavy |
| Just use a huge context | <100k tokens corpus, 1M model | Expensive per query, low recall |
Mental parallels (non-AI)
- Investigative journalist: doesn't query Google once. Rewrites the question, checks multiple sources, cross-references, verifies. Returns to sources if the story doesn't fit. Then publishes.
- Medical differential diagnosis: a doctor considers symptoms, runs tests (retrievals), rules out hypotheses, orders more tests if needed, converges on a diagnosis.
- Research PhD workflow: question -> lit review (retrieval) -> preliminary hypothesis -> more targeted search -> experiments -> revise -> publish.
- Customer support Tier 2: unlike Tier 1 (script-following = Naive RAG), Tier 2 investigates across systems, asks clarifying questions, escalates if stuck.
Mini-lab
labs/agentic-rag/ (to create):
- Build an Agentic RAG with LangGraph over two sources:
- A vector DB with company docs
- A SQL database with live sales numbers
- Define agents: query rewriter, source picker, retriever, answer checker.
- Test with 3 query types:
- "What is our refund policy?" (docs)
- "How much did we sell last month?" (SQL)
- "What was the revenue impact of the new refund policy?" (both)
- Instrument with Langfuse.
- Compare to a Naive RAG baseline on the same queries.
Stack: uv, langchain, langgraph, qdrant, sqlite, langfuse, anthropic.
Further reading
Canonical
- Daily Dose DS, "RAG vs Agentic RAG" (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
- "Self-RAG: Learning to Retrieve, Generate, and Critique" (Asai et al., 2023) - https://arxiv.org/abs/2310.11511
- "Corrective Retrieval Augmented Generation" (Yan et al., 2024) - https://arxiv.org/abs/2401.15884
- LangGraph Agentic RAG tutorials - https://langchain-ai.github.io/langgraph/tutorials/rag/
Related in this KB
Tools
- LangGraph (LangChain): https://langchain-ai.github.io/langgraph/
- LlamaIndex agents: https://docs.llamaindex.ai/en/stable/understanding/agent/
- CrewAI: https://docs.crewai.com/
- PydanticAI: https://ai.pydantic.dev/
- OpenAI Assistants API (for RAG with tools): https://platform.openai.com/docs/assistants/overview