ReAct Pattern (Reason + Act)

Watch or read first

Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (2022) - https://arxiv.org/abs/2210.03629
Daily Dose DS, "ReAct Implementation from Scratch" in the AI Engineering Guidebook (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
LangChain ReAct agent docs: https://python.langchain.com/docs/how_to/migrate_agent/

TL;DR

ReAct is a prompting loop that combines reasoning with tool use. The LLM alternates Thought (plan), Action (call a tool), Observation (tool result), over and over, until it produces an Answer. It is the foundational agent pattern: CrewAI, LangGraph, and many others default to it.

The historical problem

Before ReAct (Yao et al., Oct 2022):

Chain of Thought (CoT) improved reasoning but stayed in the LLM's head. No tools, no world interaction.
Tool-use agents could act but often picked the wrong tool because they lacked explicit reasoning steps.

ReAct glued them: "first think, then act, then observe, then think again". This pattern turns out to be much more reliable than tool-use-only or CoT-only approaches for tasks that need both reasoning AND action.

How it works

The loop

Thought    : describe what I'm thinking about
PAUSE      : wait to decide action
Action     : pick a tool from the available list and call it
PAUSE      : wait for the tool result
Observation: the tool's output
(repeat)
Answer     : the final response to the user

Loop visualized

Example trace

User: What is double the population of Japan?

Thought: I need to find the population of Japan first.
PAUSE
Action: lookup_population: Japan
PAUSE
Observation: 125,000,000

Thought: Now I need to double it.
PAUSE
Action: math: 125000000 * 2
PAUSE
Observation: 250000000

Answer: Double the population of Japan is 250 million.

The LLM's "internal monologue" (Thought) and "action in the world" (Action) interleave. Each observation updates its understanding. When it has enough, it breaks the loop with Answer.

System prompt structure

A ReAct system prompt defines:

The loop format (Thought / PAUSE / Action / PAUSE / Observation / Answer)
The available tools with name, example usage, expected output
A worked example showing the flow
Stop condition ("when you have the answer, break the loop")

From-scratch implementation (Daily Dose DS walkthrough)

Two versions: manual (step the loop by hand) and automated (controller orchestrates).

Minimal Agent class

from litellm import completion

class Agent:
    def __init__(self, system=""):
        self.messages = []
        if system:
            self.messages.append({"role": "system", "content": system})

    def __call__(self, message=None):
        if message:
            self.messages.append({"role": "user", "content": message})
        result = self.invoke()
        self.messages.append({"role": "assistant", "content": result})
        return result

    def invoke(self):
        response = completion(model="openai/gpt-4o", messages=self.messages)
        return response.choices[0].message.content

The ReAct system prompt

You run in a loop and do JUST ONE thing per iteration:

1) "Thought" to describe your thoughts about the input question.
2) "PAUSE" to pause and think about the action to take.
3) "Action" to decide what action to take from the list of actions available.
4) "PAUSE" to pause and wait for the result of the action.
5) "Observation" will be the output returned by the action.

At the end of the loop, you produce an Answer.

Actions available:

math:
  e.g. math: (14 * 5) / 4
  Evaluates mathematical expressions using Python syntax.

lookup_population:
  e.g. lookup_population: India
  Returns the latest known population of the specified country.

Whenever you have the answer, stop the loop and output it to the user.

Now begin solving:

Three design decisions baked in:

Single step per iteration prevents the model from jumping to the answer.
PAUSE markers split internal reasoning from action, and action from observation.
Tool spec with example usage reduces hallucinated tool calls.

Manual ReAct run

agent = Agent(system=system_prompt)

# Turn 1: ask the question
print(agent("What is the sum of the population of India and Japan?"))
# -> "Thought: I need to find the population of India first."

# Turn 2: let it continue (no input)
print(agent())
# -> "PAUSE"

# Turn 3
print(agent())
# -> "Action: lookup_population: India"

# Turn 4
print(agent())
# -> "PAUSE"

# Turn 5: inject observation
print(agent("Observation: 1400000000"))
# -> "Thought: Now I need the population of Japan."

# ... and so on

You see exactly what the LLM thinks and when it wants to act.

Automated controller

import re

def agent_loop(query: str, system_prompt: str):
    agent = Agent(system=system_prompt)
    tools = {
        "math": lambda expr: eval(expr),
        "lookup_population": lambda country: POPULATIONS.get(country, "unknown"),
    }

    current_prompt = query
    for i in range(20):  # max 20 iterations
        response = agent(current_prompt)
        print(f"--- iter {i} ---")
        print(response)

        if "Answer:" in response:
            break

        if "Action:" in response:
            m = re.search(r"Action:\s*(\w+):\s*(.+)", response)
            if m:
                tool, arg = m.group(1), m.group(2).strip()
                if tool in tools:
                    obs = tools[tool](arg)
                    current_prompt = f"Observation: {obs}"
                else:
                    current_prompt = f"Observation: tool {tool} not found"
            else:
                current_prompt = ""
        else:
            current_prompt = ""  # PAUSE or Thought: let it continue

This is ~30 lines of Python. Every framework wraps something like this.

Limitations of regex-based ReAct

The manual regex version is fragile:

Whitespace or casing changes can break parsing
The model might call a non-existent tool
No type safety on arguments
No retry on tool errors

In 2026, use native function calling (see function calling) instead of regex-parsed ReAct. The loop is the same; the transport is cleaner.

ReAct as one of five agentic patterns

Daily Dose DS lists ReAct alongside Reflection, Tool Use, Planning, and Multi-Agent as the five main agentic patterns. See agentic design patterns.

Specifically:

Reflection: agent critiques its own output and retries
Tool Use: agent calls external tools
ReAct: reasoning + acting (= reflection + tool use combined)
Planning: agent creates a roadmap before executing
Multi-Agent: multiple specialized agents collaborate

ReAct is the first one most practitioners should master. It is the foundation.

Relevance today (2026)

Still the default

Every major framework (CrewAI, LangGraph, LlamaIndex Agents, PydanticAI, OpenAI Agents SDK) implements ReAct as a core pattern. If you build an agent in 2026, you are probably running a variant of ReAct.

Function calling replaces text parsing

Original ReAct paper used text parsing (look for "Action:" keyword). Modern ReAct uses native function calling. Same loop, cleaner transport.

Reasoning models internalize part of ReAct

Reasoning models (o1, o3, Claude Opus 4.5 thinking, R1) do Thought/Action-like reasoning internally before responding. You can skip explicit CoT in your prompt on these models. But you STILL wrap them in a ReAct-like outer loop when tools are involved.

Planning-first patterns gained ground

Plan-then-execute (make a full plan, then run it without deviation) and ReWOO (Reasoning Without Observation) are alternative patterns. They trade some flexibility for lower latency. See agentic design patterns.

Observability is mandatory

Every iteration is a fork point for things to go wrong. LangSmith, Langfuse, Arize, Helicone - pick one. Without traces you cannot debug ReAct.

Cost control

ReAct runs the LLM N times per task. Prompt caching flattens the cost of the growing conversation history. Set max iterations. Monitor per-task cost.

Critical questions

Why not have the LLM act in one shot with no reasoning? (Tool choice accuracy drops. Daily Dose DS: ReAct explicitly > tool-use-only.)
Why not pure CoT with no tools? (The LLM is frozen at training. For fresh or external data, tools are required.)
When is planning-first better than ReAct? (Predictable multi-step tasks where the plan is stable. ReAct wins when you cannot predict the path.)
How many iterations is too many? (3-5 for simple, 10-20 for complex. Past 20 usually means poor tool design or bad decomposition.)
Should you expose "Thought" to the user? (Sometimes. Users like seeing the agent "think". But thoughts can expose system internals. Filter or paraphrase.)
How is ReAct different from ARQ? (ARQ forces a JSON schema for reasoning steps. ReAct uses free-form text. ARQ is more auditable, less flexible.)

Production pitfalls

Infinite loops. Agent thinks, acts, observes, thinks, acts, observes forever. Cap iterations and token budget.
Hallucinated observations. Without strict parsing, the model invents tool results. Use function calling, never let the model generate its own Observation lines.
Tool output too long. Pasting a 50KB tool result blows context. Summarize, truncate, chunk.
Mixed languages in output. Model thinks in English but replies in French. Standardize via system prompt.
Plan drift. Over 10 iterations, agent forgets the original goal. Remind periodically.
No human escape. When stuck, agent should ask the user. Build that affordance in.
No tracing. Without LangSmith/Langfuse you will never debug complex failures.

Alternatives / Comparisons

Pattern	When	Pros	Cons
CoT only (no tools)	LLM already knows the answer	Fast, cheap	No fresh data
Tool use only	Single API call	Simple	No reasoning about which tool
ReAct	General purpose	Flexible, introspectable	Slower, more expensive
Plan-then-execute	Predictable sequence	Fewer LLM calls	Brittle on unexpected results
ReWOO	Cost-sensitive	Cheaper	Less adaptive
ARQ	Policy-heavy agents	Auditable, robust	Less flexible

Mental parallels (non-AI)

Scientific method: hypothesize (Thought), experiment (Action), observe (Observation), revise (Thought). Repeat.
Private detective: thinks about the case, interviews someone (Action), gets testimony (Observation), thinks again. Novels work on this structure.
Chess on a timer: think, move, see opponent response, think again. Forced alternation.
Pair programming: one partner thinks aloud ("I think we should check X"), the other runs the code ("Result is Y"). Alternation between reasoning and action.

Mini-lab

labs/react-from-scratch/ (to create):

Build the Daily Dose DS manual ReAct: Agent class + system prompt + 2 tools.
Run a query manually step-by-step. Log every thought/action/observation.
Now automate: write the controller. Cap at 10 iterations.
Add a third tool: web_search via Tavily.
Test a multi-step query: "Summarize the plot of the most recent Pixar movie."
Re-implement with OpenAI function calling (no regex). Compare robustness.
Port to LangGraph. Compare lines of code.

Stack: uv, litellm or anthropic, tavily-python.

ReAct Pattern (Reason + Act)

ReAct Pattern (Reason + Act)

Watch or read first

TL;DR

The historical problem

How it works

The loop

Loop visualized

Example trace

System prompt structure

From-scratch implementation (Daily Dose DS walkthrough)

Minimal Agent class

The ReAct system prompt

Manual ReAct run

Automated controller

Limitations of regex-based ReAct

ReAct as one of five agentic patterns

Relevance today (2026)

Still the default

Function calling replaces text parsing

Reasoning models internalize part of ReAct

Planning-first patterns gained ground

Observability is mandatory

Cost control

Critical questions

Production pitfalls

Alternatives / Comparisons

Mental parallels (non-AI)

Mini-lab

Further reading

Canonical

Related in this KB

Frameworks