Advanced Architectures: ReAct, Reflection, and Iterative Loops

Introduction: Beyond Simple Chains

Welcome back, aspiring agent architects! In our previous chapters, we laid the groundwork for understanding autonomous AI agents. We explored how Large Language Models (LLMs) serve as the brain, enabling agents to plan, reason, and leverage external tools and memory systems. We even touched upon basic execution flows.

However, as you might have guessed, real-world problems are rarely simple, one-shot tasks. What happens when an agent makes a mistake? How does it learn from its failures? How can it intelligently decide which tool to use and when, in a dynamic environment? This is where advanced architectures come into play!

In this chapter, we’re going to level up our agent design skills. We’ll dive into powerful architectural patterns like ReAct, Reflection, and Iterative Planning-Execution Loops. These concepts are crucial for building agents that are not just smart, but also robust, adaptable, and capable of handling complex, multi-step problems with self-correction. Get ready to transform your agents from simple automatons into truly intelligent problem-solvers!

The Need for Advanced Architectures

Before we jump into the “how,” let’s briefly touch on the “why.” Why can’t a simple chain of LLM calls suffice for complex tasks?

Imagine you ask an agent to “find the best coffee shop near the Eiffel Tower and book a table for two.” A simple LLM might:

Generate a plan.
Call a “search_landmarks” tool.
Call a “find_coffee_shops” tool.
Call a “book_table” tool.

What if the “find_coffee_shops” tool returns no results near the Eiffel Tower? A simple chain might just fail or hallucinate a solution. It lacks the ability to:

Self-correct: Realize its initial approach was flawed.
Reason dynamically: Adapt its plan based on unexpected tool outputs.
Learn from experience: Remember what didn’t work.

This is precisely where advanced architectures shine. They introduce mechanisms for dynamic reasoning, tool interaction, and self-evaluation, making agents far more capable.

Core Concepts: ReAct, Reflection, and Iterative Loops

Let’s break down these powerful architectural patterns one by one.

1. ReAct: Reasoning and Acting in Harmony

The ReAct (Reason + Act) paradigm is a groundbreaking approach that enables LLMs to perform dynamic reasoning, plan steps, and interact with external tools in a robust, iterative manner. It’s like giving your agent a continuous internal monologue and a set of actions it can take.

What is ReAct?

ReAct combines “Reasoning” (Thought) and “Acting” (Action) steps within a single, iterative loop. The LLM generates a Thought, then based on that thought, decides on an Action to take (e.g., calling a tool). The Observation from that action is then fed back into the LLM, informing its next Thought.

Why is ReAct Important?

Dynamic Tool Use: Agents can intelligently decide which tool to use and when, rather than following a predefined script.
Problem Decomposition: Complex tasks are broken down into smaller, manageable Thought -> Action -> Observation cycles.
Improved Robustness: The agent can react to unexpected tool outputs or errors by adjusting its Thought process.
Transparency: The Thought steps provide a trace of the agent’s reasoning, making it easier to understand and debug.

How ReAct Works: The `Thought -> Action -> Observation` Loop

The core of ReAct is a continuous loop that mimics human problem-solving:

Thought: The agent (LLM) generates an internal thought, explaining its current reasoning, what it’s trying to achieve next, and why.
Action: Based on the Thought, the agent decides on an Action to take. This usually involves:
- Calling an external tool with specific arguments.
- Providing a final answer if the goal is met.
Observation: The result of the Action is observed. If a tool was called, this is the tool’s output. If a final answer was given, this might be a confirmation.
Loop: The Observation is fed back into the LLM’s context, becoming part of the prompt for the next Thought. This cycle continues until the agent determines it has completed the task or needs to stop.

Let’s visualize this with a simple diagram:

flowchart TD Start[Start Task] --> LLM_Think{LLM: Generate Thought} LLM_Think --> Decide_Action{LLM: Decide Action} Decide_Action -->|Action: Use Tool| Execute_Tool[Execute Tool] Execute_Tool --> Observe_Output[Observe Tool Output] Observe_Output --> LLM_Think Decide_Action -->|Action: Final Answer| End[End Task - Provide Answer]

Figure 8.1: The core ReAct loop.

This loop allows the agent to continuously refine its understanding and strategy. For example, if a tool call fails, the Observation will reflect that failure, prompting the LLM to Thought about an alternative approach.

2. Reflection: The Power of Self-Correction

While ReAct allows agents to react dynamically, Reflection takes it a step further by enabling agents to critically evaluate their own past performance, identify errors, and learn from them to improve future actions. It’s like having a built-in mentor for your agent!

What is Reflection?

Reflection is the ability of an agent to review its historical trajectory (the sequence of Thoughts, Actions, and Observations), identify shortcomings, and generate improvements or corrections. This usually involves a separate “reflection” phase or a meta-LLM that analyzes the agent’s log.

Why is Reflection Important?

Robustness: Agents become more resilient to mistakes and edge cases.
Continuous Improvement: Over time, agents can learn to avoid common pitfalls.
Handling Ambiguity: Reflection helps agents re-evaluate when faced with unclear or contradictory information.
Safety: By scrutinizing its own behavior, an agent can potentially identify and mitigate unsafe or biased outputs.

How Reflection Works: A Meta-Cognitive Loop

Reflection often sits on top of a ReAct-like loop. After an agent attempts a task (or a significant part of it), a reflection mechanism kicks in:

Execution Trace: The agent’s entire sequence of Thought -> Action -> Observation is recorded.
Reflection Prompt: A separate prompt is given to an LLM (often the same one, but with a different instruction set) asking it to critically analyze the execution trace. This prompt might ask:
- “What went wrong?”
- “What could have been done better?”
- “Are there any biases in the output?”
- “How should the agent approach similar problems in the future?”
Refinement/Feedback: The LLM generates “reflection” or “feedback” based on the analysis. This feedback can then be used to:
- Modify the agent’s internal state or “memory.”
- Adjust future prompts or strategies.
- Trigger a re-attempt of the task with a refined approach.

Consider this expanded view:

flowchart TD subgraph Agent_Core["Agent Core "] Start_Task[Start Task] --> LLM_Think{LLM: Generate Thought} LLM_Think --> Decide_Action{LLM: Decide Action} Decide_Action -->|Action: Use Tool| Execute_Tool[Execute Tool] Execute_Tool --> Observe_Output[Observe Tool Output] Observe_Output --> LLM_Think Decide_Action -->|Action: Final Answer| Task_Completed[Task Attempt Completed] end Task_Completed --> Review_Trace[Review Execution Trace] Review_Trace --> LLM_Reflect{LLM: Generate Reflection} LLM_Reflect --> |Feedback/Improvements| Refine_Strategy[Refine Strategy or Re-attempt] Refine_Strategy --> Start_Task

Figure 8.2: Integrating Reflection into an Agent’s Workflow.

This cycle allows agents to “learn” from their mistakes in a structured way, leading to more robust and intelligent behavior over time.

3. Iterative Planning-Execution Loops

ReAct and Reflection are specific patterns that contribute to a broader architectural concept: Iterative Planning-Execution Loops. This is the overarching framework for agents that tackle complex, long-horizon tasks by continuously planning, executing, observing, and refining their strategy.

What are Iterative Planning-Execution Loops?

These are architectures where an agent doesn’t just execute a static plan. Instead, it dynamically generates a plan, executes a part of it, evaluates the outcome, and then re-plans or adjusts its strategy based on new information or unexpected results.

Why are they Important?

Complex Task Handling: Essential for problems that cannot be solved in a single pass or require dynamic adaptation.
Adaptability: Agents can operate effectively in uncertain or changing environments.
Goal-Oriented: The loop continually drives the agent towards its ultimate goal, even if detours are necessary.

How They Work: A General Framework

While the specifics can vary, most iterative planning-execution loops share these phases:

Goal Setting: Clearly define the ultimate objective.
Planning: Generate a sequence of high-level steps or sub-goals to achieve the main goal. This plan is often dynamic and can change.
Execution: Perform the current step of the plan, often using ReAct-like sub-loops involving tool calls.
Observation & Monitoring: Gather information about the outcome of the execution. Check progress against the goal.
Evaluation & Reflection: Assess if the execution was successful, if the plan needs adjustment, or if any errors occurred. This is where reflection mechanisms are crucial.
Re-planning/Adjustment: Based on the evaluation, update the plan, generate a new sub-goal, or refine the strategy.
Loop: Continue iterating until the main goal is achieved or a termination condition is met.

flowchart TD A[Define Goal] --> B[Generate/Update Plan] B --> C{Execute Current Plan Step} C --> D[Observe & Monitor Results] D --> E{Evaluate & Reflect} E -->|Plan Needs Adjustment| B E -->|Goal Achieved| F[End Process] E -->|Cannot Progress| G[Handle Stagnation/Failure]

Figure 8.3: General Iterative Planning-Execution Loop.

This general framework underpins many advanced agent systems, from automated coding agents to intelligent workflow automators. The key is the continuous feedback loop that allows the agent to be proactive and adaptive.

Step-by-Step Implementation: A Simplified ReAct Agent

Now that we understand the concepts, let’s build a simplified ReAct agent in Python. We’ll use mock functions for our LLM and tools to focus on the architectural pattern itself.

For our example, we’ll imagine an agent whose goal is to “find information about the latest stable release of Python and summarize it.”

Prerequisites

You’ll need Python 3.9+ installed.

1. Set Up Your Environment

First, create a new directory and a Python file, say react_agent.py.

mkdir agentic_architectures
cd agentic_architectures
touch react_agent.py

2. Define Mock LLM and Tools

We’ll simulate an LLM’s response and a simple search tool.

Open react_agent.py and add the following:

# react_agent.py

import json
import time

# --- Mock LLM ---
def mock_llm_response(prompt: str) -> str:
    """
    Simulates an LLM's response based on the prompt.
    In a real scenario, this would be an API call to OpenAI, Claude, Azure OpenAI, etc.
    """
    print(f"\n--- LLM Called with Prompt ---\n{prompt}\n--- End LLM Prompt ---")

    # Simulate reasoning and action based on prompt keywords
    if "latest stable release of Python" in prompt and "Action: search_web" not in prompt:
        return """Thought: The user wants to know the latest stable release of Python. I should use a web search tool to find this information.
Action: search_web("latest stable Python release")"""
    elif "Python 3.12.2" in prompt or "Python 3.12" in prompt:
        return """Thought: I have found the latest stable release is Python 3.12.2. I should summarize this information and provide a final answer.
Action: Final Answer: The latest stable release of Python as of early 2026 is Python 3.12.2. It was released on February 6, 2024, and includes various bug fixes and improvements over previous versions. For the most up-to-date information, always check the official Python website."""
    elif "Action: search_web" in prompt and "latest Python release" not in prompt:
        return """Thought: It seems I'm being asked to search for something else, but my current goal is about Python releases. I will try to re-evaluate.
Action: search_web("latest stable Python release")""" # Fallback to original goal if confused
    else:
        return """Thought: I'm not sure how to proceed with this prompt. It seems I've lost context or the prompt is ambiguous. I will try to provide a general answer or ask for clarification.
Action: Final Answer: I am unable to determine the precise latest stable Python release with the given information. Please provide more context or refine your request."""

# --- Mock Tools ---
def search_web(query: str) -> str:
    """
    Simulates a web search tool.
    In a real scenario, this would integrate with a search API (e.g., Google Search API).
    """
    print(f"\n--- Tool Call: search_web('{query}') ---")
    time.sleep(1) # Simulate network delay
    if "latest stable Python release" in query:
        # As of 2026-03-20, assuming Python 3.12.2 is the latest stable.
        # This information would be dynamically retrieved in a real agent.
        return json.dumps({
            "query": query,
            "results": [
                {"title": "Python 3.12.2 released - Python.org", "snippet": "The Python core development team announces the release of Python 3.12.2. This is the second maintenance release of Python 3.12.", "url": "https://www.python.org/downloads/release/python-3122/"},
                {"title": "Download Python", "snippet": "Latest stable release: Python 3.12.2", "url": "https://www.python.org/downloads/"}
            ]
        })
    else:
        return json.dumps({"query": query, "results": []})

# --- Tool Registry ---
# A dictionary mapping tool names to their functions
available_tools = {
    "search_web": search_web,
}

print("Mock LLM and tools initialized.")

Explanation:

mock_llm_response: This function simulates our LLM. It takes a prompt and returns a string containing a Thought: and an Action:. We’ve hardcoded some logic to make it respond appropriately to our specific task.
search_web: This function simulates an external web search. It takes a query and returns a JSON string representing search results. We’re assuming Python 3.12.2 is the latest stable version for our 2026-03-20 context.
available_tools: A dictionary that lets our agent easily look up and call tools by name.

3. Implement the ReAct Agent Loop

Now, let’s put the Thought -> Action -> Observation loop into action.

Add the following code to react_agent.py, below the mock functions:

# react_agent.py (continued)

def run_react_agent(task_description: str, max_iterations: int = 5):
    """
    Runs a simplified ReAct agent to complete a task.
    """
    print(f"\n--- Starting ReAct Agent for Task: '{task_description}' ---\n")

    full_prompt_history = []
    current_observation = ""
    final_answer = None

    for i in range(max_iterations):
        print(f"\n--- Iteration {i+1}/{max_iterations} ---")

        # 1. Prepare the prompt for the LLM
        # The prompt includes the task, previous thoughts, actions, and observations.
        prompt = f"You are an AI assistant designed to complete tasks by thinking, acting, and observing.\n" \
                 f"Your goal is: {task_description}\n\n" \
                 f"Here is the history of your thoughts, actions, and observations:\n" \
                 f"{''.join(full_prompt_history)}\n" \
                 f"Current Observation: {current_observation}\n\n" \
                 f"Think about your next step, then decide on an Action.\n" \
                 f"Available Tools: {list(available_tools.keys())}\n" \
                 f"Format: Thought: [your thought]\nAction: [tool_name(\"arg\") or Final Answer: \"your answer\"]\n"

        # 2. Get LLM's Thought and Action
        llm_output = mock_llm_response(prompt)
        full_prompt_history.append(f"LLM Output:\n{llm_output}\n") # Store for history

        # Parse Thought and Action
        thought_match = llm_output.find("Thought:")
        action_match = llm_output.find("Action:")

        if thought_match != -1 and action_match != -1:
            thought = llm_output[thought_match + len("Thought:"):action_match].strip()
            action_line = llm_output[action_match + len("Action:"):].strip()
        else:
            print("Error: LLM output did not contain expected 'Thought:' and 'Action:' format.")
            break

        print(f"Agent Thought: {thought}")
        print(f"Agent Action: {action_line}")

        # 3. Execute the Action
        current_observation = "" # Reset observation for the new step

        if action_line.startswith("Final Answer:"):
            final_answer = action_line[len("Final Answer:"):].strip()
            print(f"\n--- Agent Completed Task ---")
            print(f"Final Answer: {final_answer}")
            break
        elif "(" in action_line and ")" in action_line:
            # Parse tool call (e.g., "search_web("query string")")
            tool_name_end = action_line.find("(")
            tool_name = action_line[:tool_name_end].strip()
            tool_args_start = tool_name_end + 1
            tool_args_end = action_line.rfind(")")
            tool_args_str = action_line[tool_args_start:tool_args_end].strip().strip('"') # Remove quotes

            if tool_name in available_tools:
                tool_function = available_tools[tool_name]
                try:
                    current_observation = tool_function(tool_args_str)
                    print(f"Observation: {current_observation}")
                except Exception as e:
                    current_observation = f"Error executing tool '{tool_name}': {e}"
                    print(f"Observation (Error): {current_observation}")
            else:
                current_observation = f"Error: Unknown tool '{tool_name}'."
                print(f"Observation (Error): {current_observation}")
        else:
            current_observation = "Error: Malformed action. Expected 'Final Answer:' or 'tool_name(\"args\")'."
            print(f"Observation (Error): {current_observation}")

        full_prompt_history.append(f"Observation: {current_observation}\n")

    if not final_answer:
        print(f"\n--- Agent did not complete task within {max_iterations} iterations. ---")
    return final_answer

# --- Run the agent ---
if __name__ == "__main__":
    task = "Find the latest stable release of Python and summarize it."
    run_react_agent(task)

Explanation:

run_react_agent: This is our main agent function. It takes a task_description and a max_iterations limit.
Prompt Construction: Inside the loop, we build a prompt for the LLM. Critically, this prompt includes the task_description, the full_prompt_history (all previous Thoughts, Actions, and Observations), and the current_observation from the last step. This is how the LLM maintains context and learns.
LLM Call: We call mock_llm_response with our constructed prompt.
Parsing Output: We parse the LLM’s response to extract the Thought and the Action.
Action Execution:
- If the Action is “Final Answer:”, we extract the answer and terminate.
- If it’s a tool call (e.g., search_web("...")), we extract the tool name and arguments, then call the appropriate function from available_tools.
Observation: The result of the action (either the tool’s output or an error message) becomes the current_observation for the next iteration.
Loop Continuation: The process repeats, with the LLM getting more context with each step, allowing it to dynamically adjust its plan.

4. Run Your ReAct Agent!

Save react_agent.py and run it from your terminal:

python react_agent.py

You should see output similar to this (though the exact LLM mock responses might vary slightly based on the hardcoded logic):

Mock LLM and tools initialized.

--- Starting ReAct Agent for Task: 'Find the latest stable release of Python and summarize it.' ---

--- Iteration 1/5 ---

--- LLM Called with Prompt ---
You are an AI assistant designed to complete tasks by thinking, acting, and observing.
Your goal is: Find the latest stable release of Python and summarize it.

Here is the history of your thoughts, actions, and observations:

Current Observation:

Think about your next step, then decide on an Action.
Available Tools: ['search_web']
Format: Thought: [your thought]
Action: [tool_name("arg") or Final Answer: "your answer"]

--- End LLM Prompt ---
Agent Thought: The user wants to know the latest stable release of Python. I should use a web search tool to find this information.
Agent Action: search_web("latest stable Python release")

--- Tool Call: search_web('latest stable Python release') ---
Observation: {"query": "latest stable Python release", "results": [{"title": "Python 3.12.2 released - Python.org", "snippet": "The Python core development team announces the release of Python 3.12.2. This is the second maintenance release of Python 3.12.", "url": "https://www.python.org/downloads/release/python-3122/"}, {"title": "Download Python", "snippet": "Latest stable release: Python 3.12.2", "url": "https://www.python.org/downloads/"}]}

--- Iteration 2/5 ---

--- LLM Called with Prompt ---
You are an AI assistant designed to complete tasks by thinking, acting, and observing.
Your goal is: Find the latest stable release of Python and summarize it.

Here is the history of your thoughts, actions, and observations:
LLM Output:
Thought: The user wants to know the latest stable release of Python. I should use a web search tool to find this information.
Action: search_web("latest stable Python release")
Observation: {"query": "latest stable Python release", "results": [{"title": "Python 3.12.2 released - Python.org", "snippet": "The Python core development team announces the release of Python 3.12.2. This is the second maintenance release of Python 3.12.", "url": "https://www.python.org/downloads/release/python-3122/"}, {"title": "Download Python", "snippet": "Latest stable release: Python 3.12.2", "url": "https://www.python.org/downloads/"}]}

Current Observation:

Think about your next step, then decide on an Action.
Available Tools: ['search_web']
Format: Thought: [your thought]\nAction: [tool_name("arg") or Final Answer: "your answer"]

--- End LLM Prompt ---
Agent Thought: I have found the latest stable release is Python 3.12.2. I should summarize this information and provide a final answer.
Agent Action: Final Answer: The latest stable release of Python as of early 2026 is Python 3.12.2. It was released on February 6, 2024, and includes various bug fixes and improvements over previous versions. For the most up-to-date information, always check the official Python website.

--- Agent Completed Task ---
Final Answer: The latest stable release of Python as of early 2026 is Python 3.12.2. It was released on February 6, 2024, and includes various bug fixes and improvements over previous versions. For the most up-to-date information, always check the official Python website.

Notice how the agent first thought about using search_web, acted by calling it, observed the results, and then thought again to formulate the final answer. This iterative process is the heart of ReAct!

Mini-Challenge: Extend Your ReAct Agent

You’ve built a basic ReAct agent! Now, let’s give it a slightly more complex task.

Challenge: Modify your react_agent.py to handle the following task: “What is the current population of Tokyo, and what is a famous landmark there?”

Hints:

You’ll likely need to add a new mock tool, perhaps search_database(query: str), or enhance search_web to handle more types of queries and return different mock data.
You’ll need to update your mock_llm_response function to guide the LLM’s Thought process for this new task. It should first search for the population, then search for a landmark, and finally combine the information.
Think about how the LLM will sequentially use the tools based on its Thoughts.

What to Observe/Learn:

How the agent manages multiple steps and distinct pieces of information.
The importance of designing your mock_llm_response (or actual LLM prompts) to guide the agent through sequential reasoning.
The modularity of adding new tools to your agent’s capabilities.

Common Pitfalls & Troubleshooting

Building agents with advanced architectures can be immensely rewarding, but it comes with its own set of challenges.

Prompt Engineering Complexity: Crafting effective prompts for ReAct or reflection can be tricky.
- Pitfall: Overly verbose, ambiguous, or restrictive prompts can confuse the LLM or limit its reasoning.
- Troubleshooting: Start simple. Provide clear instructions, examples of Thought/Action formatting, and a concise task description. Iterate and refine prompts based on agent behavior. Tools like LangChain’s AgentExecutor or Microsoft’s Agent Framework abstract some of this, but understanding the underlying prompt structure is key.
Infinite Loops or Stagnation: Agents can get stuck in repetitive Thought/Action cycles or fail to progress.
- Pitfall: The LLM might generate the same Thought and Action repeatedly if the Observation doesn’t provide new information or if its reasoning gets stuck.
- Troubleshooting: Implement max_iterations (as we did). Introduce mechanisms for detecting repeated states. For reflection, if an agent is stuck, trigger a reflection step to force a re-evaluation of the strategy. Ensure tool outputs are always informative, even if they indicate an error or no results.
Over-Reflection vs. Under-Reflection:
- Pitfall: Reflecting too often can be computationally expensive and slow down the agent. Not reflecting enough can lead to an agent repeating mistakes.
- Troubleshooting: Design reflection triggers carefully: after a certain number of steps, upon tool failure, when the agent expresses uncertainty, or at the end of a major sub-task. Balance the cost of reflection with the benefit of improved robustness.
Tool Output Misinterpretation: LLMs might misread or misinterpret the output from tools.
- Pitfall: If a tool returns complex JSON or natural language, the LLM might struggle to extract the relevant information for its next Thought.
- Troubleshooting: Design tool outputs to be as clear and concise as possible. For complex outputs, consider adding a “parsing” or “summarization” step (either as another tool or directly within the LLM’s prompt instructions) to distill the essential information before feeding it back to the main reasoning loop.

Summary: Building Smarter, More Robust Agents

Congratulations! You’ve successfully navigated the exciting world of advanced agent architectures. Here’s a quick recap of the key takeaways:

ReAct (Reason + Act): This powerful paradigm allows agents to dynamically reason (Thought), take action (Action) using tools, and learn from the results (Observation) in an iterative loop. It’s fundamental for building agents that can adapt and use tools intelligently.
Reflection: By critically evaluating their past Thoughts, Actions, and Observations, agents can identify mistakes, learn from them, and refine their strategies, leading to greater robustness and self-correction.
Iterative Planning-Execution Loops: This is the overarching framework where agents continuously plan, execute parts of the plan, monitor progress, evaluate, and re-plan. It’s essential for tackling complex, long-horizon tasks in dynamic environments.
Practical Implementation: We built a simplified ReAct agent in Python, demonstrating how the Thought -> Action -> Observation cycle works with mock LLMs and tools. This hands-on experience demystifies the core logic.
Common Pitfalls: We discussed challenges like prompt complexity, infinite loops, and tool output misinterpretation, along with strategies for troubleshooting them.

These advanced architectures are crucial for moving beyond simple, reactive AI systems to truly autonomous and intelligent agents capable of sophisticated problem-solving. As you continue your journey, remember that frameworks like LangChain, AutoGen, and Microsoft Agent Framework provide robust implementations of these patterns, allowing you to build on solid foundations.

Next, we’ll explore how multiple agents can work together, communicating and collaborating to solve problems that are too complex for a single agent!

References

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.