ReAct Pattern (Reason + Act)
ReAct is a prompting loop that combines reasoning with tool use. The LLM alternates **Thought** (plan), **Action** (call a tool), **Observation** (tool result), over and over, until it produces an **Answer**. It is the foundational agent pattern: CrewAI, LangGraph, and many others default to it.
ReAct Pattern (Reason + Act)
Watch or read first
- Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (2022) - https://arxiv.org/abs/2210.03629
- Daily Dose DS, "ReAct Implementation from Scratch" in the AI Engineering Guidebook (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
- LangChain ReAct agent docs: https://python.langchain.com/docs/how_to/migrate_agent/
TL;DR
ReAct is a prompting loop that combines reasoning with tool use. The LLM alternates Thought (plan), Action (call a tool), Observation (tool result), over and over, until it produces an Answer. It is the foundational agent pattern: CrewAI, LangGraph, and many others default to it.
The historical problem
Before ReAct (Yao et al., Oct 2022):
- Chain of Thought (CoT) improved reasoning but stayed in the LLM's head. No tools, no world interaction.
- Tool-use agents could act but often picked the wrong tool because they lacked explicit reasoning steps.
ReAct glued them: "first think, then act, then observe, then think again". This pattern turns out to be much more reliable than tool-use-only or CoT-only approaches for tasks that need both reasoning AND action.
How it works
The loop
Thought : describe what I'm thinking about
PAUSE : wait to decide action
Action : pick a tool from the available list and call it
PAUSE : wait for the tool result
Observation: the tool's output
(repeat)
Answer : the final response to the user
Loop visualized
Example trace
User: What is double the population of Japan?
Thought: I need to find the population of Japan first.
PAUSE
Action: lookup_population: Japan
PAUSE
Observation: 125,000,000
Thought: Now I need to double it.
PAUSE
Action: math: 125000000 * 2
PAUSE
Observation: 250000000
Answer: Double the population of Japan is 250 million.
The LLM's "internal monologue" (Thought) and "action in the world" (Action) interleave. Each observation updates its understanding. When it has enough, it breaks the loop with Answer.
System prompt structure
A ReAct system prompt defines:
- The loop format (Thought / PAUSE / Action / PAUSE / Observation / Answer)
- The available tools with name, example usage, expected output
- A worked example showing the flow
- Stop condition ("when you have the answer, break the loop")
From-scratch implementation (Daily Dose DS walkthrough)
Two versions: manual (step the loop by hand) and automated (controller orchestrates).
Minimal Agent class
from litellm import completion
class Agent:
def __init__(self, system=""):
self.messages = []
if system:
self.messages.append({"role": "system", "content": system})
def __call__(self, message=None):
if message:
self.messages.append({"role": "user", "content": message})
result = self.invoke()
self.messages.append({"role": "assistant", "content": result})
return result
def invoke(self):
response = completion(model="openai/gpt-4o", messages=self.messages)
return response.choices[0].message.content
The ReAct system prompt
You run in a loop and do JUST ONE thing per iteration:
1) "Thought" to describe your thoughts about the input question.
2) "PAUSE" to pause and think about the action to take.
3) "Action" to decide what action to take from the list of actions available.
4) "PAUSE" to pause and wait for the result of the action.
5) "Observation" will be the output returned by the action.
At the end of the loop, you produce an Answer.
Actions available:
math:
e.g. math: (14 * 5) / 4
Evaluates mathematical expressions using Python syntax.
lookup_population:
e.g. lookup_population: India
Returns the latest known population of the specified country.
Whenever you have the answer, stop the loop and output it to the user.
Now begin solving:
Three design decisions baked in:
- Single step per iteration prevents the model from jumping to the answer.
- PAUSE markers split internal reasoning from action, and action from observation.
- Tool spec with example usage reduces hallucinated tool calls.
Manual ReAct run
agent = Agent(system=system_prompt)
# Turn 1: ask the question
print(agent("What is the sum of the population of India and Japan?"))
# -> "Thought: I need to find the population of India first."
# Turn 2: let it continue (no input)
print(agent())
# -> "PAUSE"
# Turn 3
print(agent())
# -> "Action: lookup_population: India"
# Turn 4
print(agent())
# -> "PAUSE"
# Turn 5: inject observation
print(agent("Observation: 1400000000"))
# -> "Thought: Now I need the population of Japan."
# ... and so on
You see exactly what the LLM thinks and when it wants to act.
Automated controller
import re
def agent_loop(query: str, system_prompt: str):
agent = Agent(system=system_prompt)
tools = {
"math": lambda expr: eval(expr),
"lookup_population": lambda country: POPULATIONS.get(country, "unknown"),
}
current_prompt = query
for i in range(20): # max 20 iterations
response = agent(current_prompt)
print(f"--- iter {i} ---")
print(response)
if "Answer:" in response:
break
if "Action:" in response:
m = re.search(r"Action:\s*(\w+):\s*(.+)", response)
if m:
tool, arg = m.group(1), m.group(2).strip()
if tool in tools:
obs = tools[tool](arg)
current_prompt = f"Observation: {obs}"
else:
current_prompt = f"Observation: tool {tool} not found"
else:
current_prompt = ""
else:
current_prompt = "" # PAUSE or Thought: let it continue
This is ~30 lines of Python. Every framework wraps something like this.
Limitations of regex-based ReAct
The manual regex version is fragile:
- Whitespace or casing changes can break parsing
- The model might call a non-existent tool
- No type safety on arguments
- No retry on tool errors
In 2026, use native function calling (see function calling) instead of regex-parsed ReAct. The loop is the same; the transport is cleaner.
ReAct as one of five agentic patterns
Daily Dose DS lists ReAct alongside Reflection, Tool Use, Planning, and Multi-Agent as the five main agentic patterns. See agentic design patterns.
Specifically:
- Reflection: agent critiques its own output and retries
- Tool Use: agent calls external tools
- ReAct: reasoning + acting (= reflection + tool use combined)
- Planning: agent creates a roadmap before executing
- Multi-Agent: multiple specialized agents collaborate
ReAct is the first one most practitioners should master. It is the foundation.
Relevance today (2026)
Still the default
Every major framework (CrewAI, LangGraph, LlamaIndex Agents, PydanticAI, OpenAI Agents SDK) implements ReAct as a core pattern. If you build an agent in 2026, you are probably running a variant of ReAct.
Function calling replaces text parsing
Original ReAct paper used text parsing (look for "Action:" keyword). Modern ReAct uses native function calling. Same loop, cleaner transport.
Reasoning models internalize part of ReAct
Reasoning models (o1, o3, Claude Opus 4.5 thinking, R1) do Thought/Action-like reasoning internally before responding. You can skip explicit CoT in your prompt on these models. But you STILL wrap them in a ReAct-like outer loop when tools are involved.
Planning-first patterns gained ground
Plan-then-execute (make a full plan, then run it without deviation) and ReWOO (Reasoning Without Observation) are alternative patterns. They trade some flexibility for lower latency. See agentic design patterns.
Observability is mandatory
Every iteration is a fork point for things to go wrong. LangSmith, Langfuse, Arize, Helicone - pick one. Without traces you cannot debug ReAct.
Cost control
ReAct runs the LLM N times per task. Prompt caching flattens the cost of the growing conversation history. Set max iterations. Monitor per-task cost.
Critical questions
- Why not have the LLM act in one shot with no reasoning? (Tool choice accuracy drops. Daily Dose DS: ReAct explicitly > tool-use-only.)
- Why not pure CoT with no tools? (The LLM is frozen at training. For fresh or external data, tools are required.)
- When is planning-first better than ReAct? (Predictable multi-step tasks where the plan is stable. ReAct wins when you cannot predict the path.)
- How many iterations is too many? (3-5 for simple, 10-20 for complex. Past 20 usually means poor tool design or bad decomposition.)
- Should you expose "Thought" to the user? (Sometimes. Users like seeing the agent "think". But thoughts can expose system internals. Filter or paraphrase.)
- How is ReAct different from ARQ? (ARQ forces a JSON schema for reasoning steps. ReAct uses free-form text. ARQ is more auditable, less flexible.)
Production pitfalls
- Infinite loops. Agent thinks, acts, observes, thinks, acts, observes forever. Cap iterations and token budget.
- Hallucinated observations. Without strict parsing, the model invents tool results. Use function calling, never let the model generate its own Observation lines.
- Tool output too long. Pasting a 50KB tool result blows context. Summarize, truncate, chunk.
- Mixed languages in output. Model thinks in English but replies in French. Standardize via system prompt.
- Plan drift. Over 10 iterations, agent forgets the original goal. Remind periodically.
- No human escape. When stuck, agent should ask the user. Build that affordance in.
- No tracing. Without LangSmith/Langfuse you will never debug complex failures.
Alternatives / Comparisons
| Pattern | When | Pros | Cons |
|---|---|---|---|
| CoT only (no tools) | LLM already knows the answer | Fast, cheap | No fresh data |
| Tool use only | Single API call | Simple | No reasoning about which tool |
| ReAct | General purpose | Flexible, introspectable | Slower, more expensive |
| Plan-then-execute | Predictable sequence | Fewer LLM calls | Brittle on unexpected results |
| ReWOO | Cost-sensitive | Cheaper | Less adaptive |
| ARQ | Policy-heavy agents | Auditable, robust | Less flexible |
Mental parallels (non-AI)
- Scientific method: hypothesize (Thought), experiment (Action), observe (Observation), revise (Thought). Repeat.
- Private detective: thinks about the case, interviews someone (Action), gets testimony (Observation), thinks again. Novels work on this structure.
- Chess on a timer: think, move, see opponent response, think again. Forced alternation.
- Pair programming: one partner thinks aloud ("I think we should check X"), the other runs the code ("Result is Y"). Alternation between reasoning and action.
Mini-lab
labs/react-from-scratch/ (to create):
- Build the Daily Dose DS manual ReAct: Agent class + system prompt + 2 tools.
- Run a query manually step-by-step. Log every thought/action/observation.
- Now automate: write the controller. Cap at 10 iterations.
- Add a third tool:
web_searchvia Tavily. - Test a multi-step query: "Summarize the plot of the most recent Pixar movie."
- Re-implement with OpenAI function calling (no regex). Compare robustness.
- Port to LangGraph. Compare lines of code.
Stack: uv, litellm or anthropic, tavily-python.
Further reading
Canonical
- Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (2022) - https://arxiv.org/abs/2210.03629
- Daily Dose DS, "ReAct Implementation from Scratch" (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
- LangChain ReAct docs: https://python.langchain.com/docs/how_to/migrate_agent/
- Anthropic, "Building effective agents" (2024): https://www.anthropic.com/research/building-effective-agents
Related in this KB
- what is an agent
- agent building blocks
- function calling
- agentic design patterns
- agent memory
- reasoning prompting techniques
- agentic rag
Frameworks
- LangChain (https://python.langchain.com/), LangGraph (https://langchain-ai.github.io/langgraph/)
- CrewAI: https://docs.crewai.com/
- LlamaIndex Agents: https://docs.llamaindex.ai/en/stable/understanding/agent/
- PydanticAI: https://ai.pydantic.dev/
- OpenAI Agents SDK: https://openai.github.io/openai-agents-python/