The ReAct Pattern: Fusing Reasoning and Acting in AI Agents
Welcome to our series on mastering AI agent programming. Today, we’re exploring one of the most influential patterns in modern agent design: ReAct, a framework that synergizes Reasoning and Acting.
1. Concept Introduction
At its core, the ReAct pattern is simple: it’s a loop where an AI agent thinks about a problem, decides on an action to take, observes the result of that action, and then uses that observation to think again. It’s a structured way for a Large Language Model (LLM) to “show its work” and interact with the outside world to gather information or perform tasks.
For the beginner: Imagine you’re a detective solving a case. You don’t just guess the culprit. You first think about the clues you have (Reasoning). Based on that, you decide to interview a witness (Action). The witness gives you new information (Observation). You then update your theory of the case based on this new information (Reasoning), and decide your next action. ReAct is a formalization of this intuitive process for an AI.
For the practitioner: ReAct is a prompting and control-flow technique that coaxes an LLM to interleave its internal reasoning process with external actions. The model is prompted to produce a structured output containing its thought process, the specific tool or action it wants to use, and the parameters for that action. An orchestrator parses this output, executes the action (e.g., an API call, a database query), and then feeds the result back into the model’s context for the next reasoning step. This transforms the LLM from a passive text generator into an active problem-solver.
2. Historical & Theoretical Context
The ReAct pattern was formally introduced by a team of researchers at Google Brain (now Google DeepMind) in their 2022 paper, “ReAct: Synergizing Reasoning and Acting in Language Models” by Shunyu Yao, et al.
The idea was born from a key limitation of early LLMs: their inability to access real-time information or interact with external systems. While models could “reason” about the data they were trained on, they couldn’t verify facts, get current data, or execute tasks.
ReAct builds on two primary lines of research:
- Chain-of-Thought (CoT) Prompting: This technique, popularized in 2022, found that prompting LLMs to “think step-by-step” significantly improved their performance on complex reasoning tasks. ReAct extends this by not just thinking, but acting on those thoughts.
- Tool-Augmented Language Models: The idea of giving models access to external tools (like calculators or search engines) has been around for a while. ReAct provides a structured framework for how the model should decide when and how to use these tools.
It connects to the core AI principle of the Sense-Plan-Act loop, a cornerstone of classical robotics and agent design. ReAct is a modern, LLM-native implementation of this very old idea.
3. Algorithms & Math
The core of ReAct is not a complex mathematical formula but an algorithmic loop.
Pseudocode for the ReAct Loop:
function react_loop(problem_description, available_tools, max_iterations):
context = "Problem: " + problem_description
for i in 1 to max_iterations:
// 1. Reason (Think)
prompt = context + "\nThought: You are a helpful assistant. Think step-by-step about how to solve the problem. What action should you take next?\nAction:"
response = llm.generate(prompt) // Model generates Thought, Action, and Action Input
thought, action, action_input = parse(response)
context += "\nThought: " + thought
// 2. Act
if action == "Finish":
return action_input // Final answer
tool_to_use = available_tools[action]
observation = tool_to_use.execute(action_input)
// 3. Observe
context += "\nAction: " + action + "(" + action_input + ")"
context += "\nObservation: " + observation
return "Reached max iterations without a solution."
The process is iterative. The context window of the LLM grows with each turn, accumulating a “scratchpad” of thoughts, actions, and observations. This allows the model to build on its previous steps and correct its course if an action fails or provides unexpected information.
4. Design Patterns & Architectures
ReAct is a specific implementation of the more general Planner-Executor pattern.
- Planner: The LLM, prompted to generate a
Thought, acts as the planner. It analyzes the current state and decides what to do next. - Executor: The orchestrator code that parses the LLM’s output and calls the specified tool is the executor.
It fits beautifully into event-driven architectures. Each Action can be seen as an event that the system dispatches. The result of that action (the Observation) is then passed back to the LLM to trigger the next reasoning step.
In frameworks like LangGraph, ReAct is often implemented as a graph where nodes represent the “Reason” and “Act” steps, and edges represent the flow of information between them. The loop continues until an edge leads to a “Finish” node.
5. Practical Application
Let’s see a simplified Python example using a hypothetical LLM library.
import hypothetical_llm as llm
import wikipedia # A hypothetical tool
class ReActAgent:
def __init__(self, tools):
self.tools = {t.name: t for t in tools}
self.history = []
def run(self, problem: str):
self.history.append(f"Problem: {problem}")
for _ in range(5): # Max 5 steps
prompt = self._create_prompt()
response_text = llm.generate(prompt, stop_sequences=["Observation:"])
self.history.append(response_text)
if "Action: Finish" in response_text:
answer = response_text.split("Action: Finish(")[1][:-1]
print(f"Final Answer: {answer}")
return
try:
action_name = response_text.split("Action: ")[1].split("(")[0]
action_input = response_text.split(f"{action_name}(")[1].split(")")[0]
except IndexError:
print("Error: Could not parse action. Retrying.")
self.history.append("Observation: Error parsing action.")
continue
if action_name in self.tools:
tool = self.tools[action_name]
observation = tool.run(action_input)
self.history.append(f"Observation: {observation}")
else:
self.history.append(f"Observation: Unknown tool '{action_name}'.")
print("Max steps reached.")
def _create_prompt(self):
# Simplified prompt construction
return "\n".join(self.history) + "\nThought:"
# --- Tool Definition ---
class WikipediaTool:
name = "WikipediaSearch"
def run(self, query: str):
# In a real scenario, this would call the Wikipedia API
return f"George Orwell was an English novelist, essayist, journalist and critic."
# --- Execution ---
agent = ReActAgent(tools=[WikipediaTool()])
agent.run("Who was the author of the book Nineteen Eighty-Four?")
In Frameworks:
- LangGraph: You’d define a
thinknode and anactnode. A conditional edge would route back tothinkif more steps are needed, or to anendnode if theFinishaction is produced. - CrewAI: The
ReActloop is abstracted away. When you define an agent and assign it tools, CrewAI’s underlying engine uses a ReAct-like process to orchestrate the agent’s execution of tasks.
6. Comparisons & Tradeoffs
ReAct vs. Chain-of-Thought (CoT):
- Strength: ReAct can correct its own mistakes. If its initial reasoning is based on a flawed assumption, an action (like a web search) can provide new information to fix it. CoT, being purely internal, cannot do this.
- Weakness: ReAct is slower and more expensive, as it requires multiple LLM calls and external tool executions for a single problem.
ReAct vs. Single-Shot Tool Use:
- Strength: ReAct allows for multi-step, complex problem-solving. A single-shot approach might answer “What is the capital of France?” but would fail at “What is the population of the capital of the country whose primary export is wine?”
- Weakness: It’s more complex to implement and orchestrate.
Limitations:
- Context Window: The history of thoughts, actions, and observations can quickly fill up the LLM’s context window.
- Error Propagation: A mistake in an early step (e.g., a tool failing or the LLM misinterpreting an observation) can derail the entire process.
7. Latest Developments & Research
The original ReAct paper was just the beginning. Recent research has focused on improving its efficiency and robustness.
- Self-Correction & Reflection: Papers like “Self-Refine” (Madaan et al., 2023) propose adding an explicit “reflection” step where the agent critiques its own work and plans how to improve it, making the loop more robust.
- Parallel Tool Use: Researchers are exploring ways for agents to execute multiple actions in parallel when possible, speeding up the process (e.g., searching for two different things at once).
- Adaptive Control: Instead of a fixed loop, some new architectures allow the agent to decide when it needs to think and when it can act more reflexively, optimizing for speed and cost. This is explored in papers on “Adaptive RAG” (Retrieval-Augmented Generation).
8. Cross-Disciplinary Insight
The ReAct pattern is a beautiful example of an idea from Cognitive Science influencing AI. It mirrors the OODA Loop (Observe, Orient, Decide, Act), a concept developed by military strategist John Boyd to describe decision-making in high-stakes environments.
- Observe: The agent gets the
Observationfrom a tool. - Orient: The agent’s
Thoughtprocess is the orientation phase, where it integrates the new information with its existing knowledge. - Decide: The agent decides on the next
Action. - Act: The orchestrator executes the
Action.
This parallel suggests that effective decision-making, whether by humans or AI, requires a tight feedback loop between internal modeling of the world and external interaction with it.
9. Daily Challenge / Thought Exercise
Problem: You want to build a simple ReAct agent that can answer questions like: “What is the current price of Bitcoin in Euros?”
- Decomposition: Break down the problem. What tools would your agent need? You’d likely need a tool to get the BTC price (probably in USD) and another to get the current USD-to-EUR exchange rate.
- Trace the Loop: Write down the
Thought,Action,Observationsteps for a single run.- Thought 1: “I need the price of Bitcoin and the USD/EUR exchange rate.”
- Action 1:
getCryptoPrice(Bitcoin) - Observation 1: “Price is $65,000 USD.”
- Thought 2: “Now I have the price in USD. I need the exchange rate.”
- Action 2:
getExchangeRate(USD, EUR) - Observation 2: “1 USD = 0.93 EUR.”
- Thought 3: “I have all the information. I need to calculate 65000 * 0.93.”
- Action 3:
Calculator(65000 * 0.93) - Observation 3: “60450”
- Thought 4: “The final answer is 60,450 Euros.”
- Action 4:
Finish(60,450 Euros)
This exercise helps you internalize how complex problems are broken down into a sequence of simple, tool-based actions.
10. References & Further Reading
- Original Paper: Yao, S., et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models.
- LangChain Implementation: LangChain’s ReAct Agent Documentation
- Blog Post: “Building a ReAct Agent from Scratch” by Jay Alammar (While he doesn’t have one on ReAct specifically, his style is a great reference for understanding these concepts).
- Related Research: Madaan, A., et al. (2023). Self-Refine: Iterative Refinement with Self-Feedback.