Introduction: Beyond Simple Chains
Welcome back, aspiring agent architects! In our previous chapters, we laid the groundwork for understanding autonomous AI agents. We explored how Large Language Models (LLMs) serve as the brain, enabling agents to plan, reason, and leverage external tools and memory systems. We even touched upon basic execution flows.
However, as you might have guessed, real-world problems are rarely simple, one-shot tasks. What happens when an agent makes a mistake? How does it learn from its failures? How can it intelligently decide which tool to use and when, in a dynamic environment? This is where advanced architectures come into play!
In this chapter, we’re going to level up our agent design skills. We’ll dive into powerful architectural patterns like ReAct, Reflection, and Iterative Planning-Execution Loops. These concepts are crucial for building agents that are not just smart, but also robust, adaptable, and capable of handling complex, multi-step problems with self-correction. Get ready to transform your agents from simple automatons into truly intelligent problem-solvers!
The Need for Advanced Architectures
Before we jump into the “how,” let’s briefly touch on the “why.” Why can’t a simple chain of LLM calls suffice for complex tasks?
Imagine you ask an agent to “find the best coffee shop near the Eiffel Tower and book a table for two.” A simple LLM might:
- Generate a plan.
- Call a “search_landmarks” tool.
- Call a “find_coffee_shops” tool.
- Call a “book_table” tool.
What if the “find_coffee_shops” tool returns no results near the Eiffel Tower? A simple chain might just fail or hallucinate a solution. It lacks the ability to:
- Self-correct: Realize its initial approach was flawed.
- Reason dynamically: Adapt its plan based on unexpected tool outputs.
- Learn from experience: Remember what didn’t work.
This is precisely where advanced architectures shine. They introduce mechanisms for dynamic reasoning, tool interaction, and self-evaluation, making agents far more capable.
Core Concepts: ReAct, Reflection, and Iterative Loops
Let’s break down these powerful architectural patterns one by one.
1. ReAct: Reasoning and Acting in Harmony
The ReAct (Reason + Act) paradigm is a groundbreaking approach that enables LLMs to perform dynamic reasoning, plan steps, and interact with external tools in a robust, iterative manner. It’s like giving your agent a continuous internal monologue and a set of actions it can take.
What is ReAct?
ReAct combines “Reasoning” (Thought) and “Acting” (Action) steps within a single, iterative loop. The LLM generates a Thought, then based on that thought, decides on an Action to take (e.g., calling a tool). The Observation from that action is then fed back into the LLM, informing its next Thought.
Why is ReAct Important?
- Dynamic Tool Use: Agents can intelligently decide which tool to use and when, rather than following a predefined script.
- Problem Decomposition: Complex tasks are broken down into smaller, manageable
Thought -> Action -> Observationcycles. - Improved Robustness: The agent can react to unexpected tool outputs or errors by adjusting its
Thoughtprocess. - Transparency: The
Thoughtsteps provide a trace of the agent’s reasoning, making it easier to understand and debug.
How ReAct Works: The Thought -> Action -> Observation Loop
The core of ReAct is a continuous loop that mimics human problem-solving:
- Thought: The agent (LLM) generates an internal thought, explaining its current reasoning, what it’s trying to achieve next, and why.
- Action: Based on the
Thought, the agent decides on anActionto take. This usually involves:- Calling an external tool with specific arguments.
- Providing a final answer if the goal is met.
- Observation: The result of the
Actionis observed. If a tool was called, this is the tool’s output. If a final answer was given, this might be a confirmation. - Loop: The
Observationis fed back into the LLM’s context, becoming part of the prompt for the nextThought. This cycle continues until the agent determines it has completed the task or needs to stop.
Let’s visualize this with a simple diagram:
Figure 8.1: The core ReAct loop.
This loop allows the agent to continuously refine its understanding and strategy. For example, if a tool call fails, the Observation will reflect that failure, prompting the LLM to Thought about an alternative approach.
2. Reflection: The Power of Self-Correction
While ReAct allows agents to react dynamically, Reflection takes it a step further by enabling agents to critically evaluate their own past performance, identify errors, and learn from them to improve future actions. It’s like having a built-in mentor for your agent!
What is Reflection?
Reflection is the ability of an agent to review its historical trajectory (the sequence of Thoughts, Actions, and Observations), identify shortcomings, and generate improvements or corrections. This usually involves a separate “reflection” phase or a meta-LLM that analyzes the agent’s log.
Why is Reflection Important?
- Robustness: Agents become more resilient to mistakes and edge cases.
- Continuous Improvement: Over time, agents can learn to avoid common pitfalls.
- Handling Ambiguity: Reflection helps agents re-evaluate when faced with unclear or contradictory information.
- Safety: By scrutinizing its own behavior, an agent can potentially identify and mitigate unsafe or biased outputs.
How Reflection Works: A Meta-Cognitive Loop
Reflection often sits on top of a ReAct-like loop. After an agent attempts a task (or a significant part of it), a reflection mechanism kicks in:
- Execution Trace: The agent’s entire sequence of
Thought -> Action -> Observationis recorded. - Reflection Prompt: A separate prompt is given to an LLM (often the same one, but with a different instruction set) asking it to critically analyze the execution trace. This prompt might ask:
- “What went wrong?”
- “What could have been done better?”
- “Are there any biases in the output?”
- “How should the agent approach similar problems in the future?”
- Refinement/Feedback: The LLM generates “reflection” or “feedback” based on the analysis. This feedback can then be used to:
- Modify the agent’s internal state or “memory.”
- Adjust future prompts or strategies.
- Trigger a re-attempt of the task with a refined approach.
Consider this expanded view:
Figure 8.2: Integrating Reflection into an Agent’s Workflow.
This cycle allows agents to “learn” from their mistakes in a structured way, leading to more robust and intelligent behavior over time.
3. Iterative Planning-Execution Loops
ReAct and Reflection are specific patterns that contribute to a broader architectural concept: Iterative Planning-Execution Loops. This is the overarching framework for agents that tackle complex, long-horizon tasks by continuously planning, executing, observing, and refining their strategy.
What are Iterative Planning-Execution Loops?
These are architectures where an agent doesn’t just execute a static plan. Instead, it dynamically generates a plan, executes a part of it, evaluates the outcome, and then re-plans or adjusts its strategy based on new information or unexpected results.
Why are they Important?
- Complex Task Handling: Essential for problems that cannot be solved in a single pass or require dynamic adaptation.
- Adaptability: Agents can operate effectively in uncertain or changing environments.
- Goal-Oriented: The loop continually drives the agent towards its ultimate goal, even if detours are necessary.
How They Work: A General Framework
While the specifics can vary, most iterative planning-execution loops share these phases:
- Goal Setting: Clearly define the ultimate objective.
- Planning: Generate a sequence of high-level steps or sub-goals to achieve the main goal. This plan is often dynamic and can change.
- Execution: Perform the current step of the plan, often using ReAct-like sub-loops involving tool calls.
- Observation & Monitoring: Gather information about the outcome of the execution. Check progress against the goal.
- Evaluation & Reflection: Assess if the execution was successful, if the plan needs adjustment, or if any errors occurred. This is where reflection mechanisms are crucial.
- Re-planning/Adjustment: Based on the evaluation, update the plan, generate a new sub-goal, or refine the strategy.
- Loop: Continue iterating until the main goal is achieved or a termination condition is met.
Figure 8.3: General Iterative Planning-Execution Loop.
This general framework underpins many advanced agent systems, from automated coding agents to intelligent workflow automators. The key is the continuous feedback loop that allows the agent to be proactive and adaptive.
Step-by-Step Implementation: A Simplified ReAct Agent
Now that we understand the concepts, let’s build a simplified ReAct agent in Python. We’ll use mock functions for our LLM and tools to focus on the architectural pattern itself.
For our example, we’ll imagine an agent whose goal is to “find information about the latest stable release of Python and summarize it.”
Prerequisites
You’ll need Python 3.9+ installed.
1. Set Up Your Environment
First, create a new directory and a Python file, say react_agent.py.
mkdir agentic_architectures
cd agentic_architectures
touch react_agent.py
2. Define Mock LLM and Tools
We’ll simulate an LLM’s response and a simple search tool.
Open react_agent.py and add the following:
# react_agent.py
import json
import time
# --- Mock LLM ---
def mock_llm_response(prompt: str) -> str:
"""
Simulates an LLM's response based on the prompt.
In a real scenario, this would be an API call to OpenAI, Claude, Azure OpenAI, etc.
"""
print(f"\n--- LLM Called with Prompt ---\n{prompt}\n--- End LLM Prompt ---")
# Simulate reasoning and action based on prompt keywords
if "latest stable release of Python" in prompt and "Action: search_web" not in prompt:
return """Thought: The user wants to know the latest stable release of Python. I should use a web search tool to find this information.
Action: search_web("latest stable Python release")"""
elif "Python 3.12.2" in prompt or "Python 3.12" in prompt:
return """Thought: I have found the latest stable release is Python 3.12.2. I should summarize this information and provide a final answer.
Action: Final Answer: The latest stable release of Python as of early 2026 is Python 3.12.2. It was released on February 6, 2024, and includes various bug fixes and improvements over previous versions. For the most up-to-date information, always check the official Python website."""
elif "Action: search_web" in prompt and "latest Python release" not in prompt:
return """Thought: It seems I'm being asked to search for something else, but my current goal is about Python releases. I will try to re-evaluate.
Action: search_web("latest stable Python release")""" # Fallback to original goal if confused
else:
return """Thought: I'm not sure how to proceed with this prompt. It seems I've lost context or the prompt is ambiguous. I will try to provide a general answer or ask for clarification.
Action: Final Answer: I am unable to determine the precise latest stable Python release with the given information. Please provide more context or refine your request."""
# --- Mock Tools ---
def search_web(query: str) -> str:
"""
Simulates a web search tool.
In a real scenario, this would integrate with a search API (e.g., Google Search API).
"""
print(f"\n--- Tool Call: search_web('{query}') ---")
time.sleep(1) # Simulate network delay
if "latest stable Python release" in query:
# As of 2026-03-20, assuming Python 3.12.2 is the latest stable.
# This information would be dynamically retrieved in a real agent.
return json.dumps({
"query": query,
"results": [
{"title": "Python 3.12.2 released - Python.org", "snippet": "The Python core development team announces the release of Python 3.12.2. This is the second maintenance release of Python 3.12.", "url": "https://www.python.org/downloads/release/python-3122/"},
{"title": "Download Python", "snippet": "Latest stable release: Python 3.12.2", "url": "https://www.python.org/downloads/"}
]
})
else:
return json.dumps({"query": query, "results": []})
# --- Tool Registry ---
# A dictionary mapping tool names to their functions
available_tools = {
"search_web": search_web,
}
print("Mock LLM and tools initialized.")
Explanation:
mock_llm_response: This function simulates our LLM. It takes a prompt and returns a string containing aThought:and anAction:. We’ve hardcoded some logic to make it respond appropriately to our specific task.search_web: This function simulates an external web search. It takes a query and returns a JSON string representing search results. We’re assuming Python 3.12.2 is the latest stable version for our 2026-03-20 context.available_tools: A dictionary that lets our agent easily look up and call tools by name.
3. Implement the ReAct Agent Loop
Now, let’s put the Thought -> Action -> Observation loop into action.
Add the following code to react_agent.py, below the mock functions:
# react_agent.py (continued)
def run_react_agent(task_description: str, max_iterations: int = 5):
"""
Runs a simplified ReAct agent to complete a task.
"""
print(f"\n--- Starting ReAct Agent for Task: '{task_description}' ---\n")
full_prompt_history = []
current_observation = ""
final_answer = None
for i in range(max_iterations):
print(f"\n--- Iteration {i+1}/{max_iterations} ---")
# 1. Prepare the prompt for the LLM
# The prompt includes the task, previous thoughts, actions, and observations.
prompt = f"You are an AI assistant designed to complete tasks by thinking, acting, and observing.\n" \
f"Your goal is: {task_description}\n\n" \
f"Here is the history of your thoughts, actions, and observations:\n" \
f"{''.join(full_prompt_history)}\n" \
f"Current Observation: {current_observation}\n\n" \
f"Think about your next step, then decide on an Action.\n" \
f"Available Tools: {list(available_tools.keys())}\n" \
f"Format: Thought: [your thought]\nAction: [tool_name(\"arg\") or Final Answer: \"your answer\"]\n"
# 2. Get LLM's Thought and Action
llm_output = mock_llm_response(prompt)
full_prompt_history.append(f"LLM Output:\n{llm_output}\n") # Store for history
# Parse Thought and Action
thought_match = llm_output.find("Thought:")
action_match = llm_output.find("Action:")
if thought_match != -1 and action_match != -1:
thought = llm_output[thought_match + len("Thought:"):action_match].strip()
action_line = llm_output[action_match + len("Action:"):].strip()
else:
print("Error: LLM output did not contain expected 'Thought:' and 'Action:' format.")
break
print(f"Agent Thought: {thought}")
print(f"Agent Action: {action_line}")
# 3. Execute the Action
current_observation = "" # Reset observation for the new step
if action_line.startswith("Final Answer:"):
final_answer = action_line[len("Final Answer:"):].strip()
print(f"\n--- Agent Completed Task ---")
print(f"Final Answer: {final_answer}")
break
elif "(" in action_line and ")" in action_line:
# Parse tool call (e.g., "search_web("query string")")
tool_name_end = action_line.find("(")
tool_name = action_line[:tool_name_end].strip()
tool_args_start = tool_name_end + 1
tool_args_end = action_line.rfind(")")
tool_args_str = action_line[tool_args_start:tool_args_end].strip().strip('"') # Remove quotes
if tool_name in available_tools:
tool_function = available_tools[tool_name]
try:
current_observation = tool_function(tool_args_str)
print(f"Observation: {current_observation}")
except Exception as e:
current_observation = f"Error executing tool '{tool_name}': {e}"
print(f"Observation (Error): {current_observation}")
else:
current_observation = f"Error: Unknown tool '{tool_name}'."
print(f"Observation (Error): {current_observation}")
else:
current_observation = "Error: Malformed action. Expected 'Final Answer:' or 'tool_name(\"args\")'."
print(f"Observation (Error): {current_observation}")
full_prompt_history.append(f"Observation: {current_observation}\n")
if not final_answer:
print(f"\n--- Agent did not complete task within {max_iterations} iterations. ---")
return final_answer
# --- Run the agent ---
if __name__ == "__main__":
task = "Find the latest stable release of Python and summarize it."
run_react_agent(task)
Explanation:
run_react_agent: This is our main agent function. It takes atask_descriptionand amax_iterationslimit.- Prompt Construction: Inside the loop, we build a
promptfor the LLM. Critically, this prompt includes thetask_description, thefull_prompt_history(all previousThoughts,Actions, andObservations), and thecurrent_observationfrom the last step. This is how the LLM maintains context and learns. - LLM Call: We call
mock_llm_responsewith our constructed prompt. - Parsing Output: We parse the LLM’s response to extract the
Thoughtand theAction. - Action Execution:
- If the
Actionis “Final Answer:”, we extract the answer and terminate. - If it’s a tool call (e.g.,
search_web("...")), we extract the tool name and arguments, then call the appropriate function fromavailable_tools.
- If the
- Observation: The result of the action (either the tool’s output or an error message) becomes the
current_observationfor the next iteration. - Loop Continuation: The process repeats, with the LLM getting more context with each step, allowing it to dynamically adjust its plan.
4. Run Your ReAct Agent!
Save react_agent.py and run it from your terminal:
python react_agent.py
You should see output similar to this (though the exact LLM mock responses might vary slightly based on the hardcoded logic):
Mock LLM and tools initialized.
--- Starting ReAct Agent for Task: 'Find the latest stable release of Python and summarize it.' ---
--- Iteration 1/5 ---
--- LLM Called with Prompt ---
You are an AI assistant designed to complete tasks by thinking, acting, and observing.
Your goal is: Find the latest stable release of Python and summarize it.
Here is the history of your thoughts, actions, and observations:
Current Observation:
Think about your next step, then decide on an Action.
Available Tools: ['search_web']
Format: Thought: [your thought]
Action: [tool_name("arg") or Final Answer: "your answer"]
--- End LLM Prompt ---
Agent Thought: The user wants to know the latest stable release of Python. I should use a web search tool to find this information.
Agent Action: search_web("latest stable Python release")
--- Tool Call: search_web('latest stable Python release') ---
Observation: {"query": "latest stable Python release", "results": [{"title": "Python 3.12.2 released - Python.org", "snippet": "The Python core development team announces the release of Python 3.12.2. This is the second maintenance release of Python 3.12.", "url": "https://www.python.org/downloads/release/python-3122/"}, {"title": "Download Python", "snippet": "Latest stable release: Python 3.12.2", "url": "https://www.python.org/downloads/"}]}
--- Iteration 2/5 ---
--- LLM Called with Prompt ---
You are an AI assistant designed to complete tasks by thinking, acting, and observing.
Your goal is: Find the latest stable release of Python and summarize it.
Here is the history of your thoughts, actions, and observations:
LLM Output:
Thought: The user wants to know the latest stable release of Python. I should use a web search tool to find this information.
Action: search_web("latest stable Python release")
Observation: {"query": "latest stable Python release", "results": [{"title": "Python 3.12.2 released - Python.org", "snippet": "The Python core development team announces the release of Python 3.12.2. This is the second maintenance release of Python 3.12.", "url": "https://www.python.org/downloads/release/python-3122/"}, {"title": "Download Python", "snippet": "Latest stable release: Python 3.12.2", "url": "https://www.python.org/downloads/"}]}
Current Observation:
Think about your next step, then decide on an Action.
Available Tools: ['search_web']
Format: Thought: [your thought]\nAction: [tool_name("arg") or Final Answer: "your answer"]
--- End LLM Prompt ---
Agent Thought: I have found the latest stable release is Python 3.12.2. I should summarize this information and provide a final answer.
Agent Action: Final Answer: The latest stable release of Python as of early 2026 is Python 3.12.2. It was released on February 6, 2024, and includes various bug fixes and improvements over previous versions. For the most up-to-date information, always check the official Python website.
--- Agent Completed Task ---
Final Answer: The latest stable release of Python as of early 2026 is Python 3.12.2. It was released on February 6, 2024, and includes various bug fixes and improvements over previous versions. For the most up-to-date information, always check the official Python website.
Notice how the agent first thought about using search_web, acted by calling it, observed the results, and then thought again to formulate the final answer. This iterative process is the heart of ReAct!
Mini-Challenge: Extend Your ReAct Agent
You’ve built a basic ReAct agent! Now, let’s give it a slightly more complex task.
Challenge: Modify your react_agent.py to handle the following task:
“What is the current population of Tokyo, and what is a famous landmark there?”
Hints:
- You’ll likely need to add a new mock tool, perhaps
search_database(query: str), or enhancesearch_webto handle more types of queries and return different mock data. - You’ll need to update your
mock_llm_responsefunction to guide the LLM’sThoughtprocess for this new task. It should first search for the population, then search for a landmark, and finally combine the information. - Think about how the LLM will sequentially use the tools based on its
Thoughts.
What to Observe/Learn:
- How the agent manages multiple steps and distinct pieces of information.
- The importance of designing your
mock_llm_response(or actual LLM prompts) to guide the agent through sequential reasoning. - The modularity of adding new tools to your agent’s capabilities.
Common Pitfalls & Troubleshooting
Building agents with advanced architectures can be immensely rewarding, but it comes with its own set of challenges.
Prompt Engineering Complexity: Crafting effective prompts for ReAct or reflection can be tricky.
- Pitfall: Overly verbose, ambiguous, or restrictive prompts can confuse the LLM or limit its reasoning.
- Troubleshooting: Start simple. Provide clear instructions, examples of
Thought/Actionformatting, and a concise task description. Iterate and refine prompts based on agent behavior. Tools like LangChain’sAgentExecutoror Microsoft’s Agent Framework abstract some of this, but understanding the underlying prompt structure is key.
Infinite Loops or Stagnation: Agents can get stuck in repetitive
Thought/Actioncycles or fail to progress.- Pitfall: The LLM might generate the same
ThoughtandActionrepeatedly if theObservationdoesn’t provide new information or if its reasoning gets stuck. - Troubleshooting: Implement
max_iterations(as we did). Introduce mechanisms for detecting repeated states. For reflection, if an agent is stuck, trigger a reflection step to force a re-evaluation of the strategy. Ensure tool outputs are always informative, even if they indicate an error or no results.
- Pitfall: The LLM might generate the same
Over-Reflection vs. Under-Reflection:
- Pitfall: Reflecting too often can be computationally expensive and slow down the agent. Not reflecting enough can lead to an agent repeating mistakes.
- Troubleshooting: Design reflection triggers carefully: after a certain number of steps, upon tool failure, when the agent expresses uncertainty, or at the end of a major sub-task. Balance the cost of reflection with the benefit of improved robustness.
Tool Output Misinterpretation: LLMs might misread or misinterpret the output from tools.
- Pitfall: If a tool returns complex JSON or natural language, the LLM might struggle to extract the relevant information for its next
Thought. - Troubleshooting: Design tool outputs to be as clear and concise as possible. For complex outputs, consider adding a “parsing” or “summarization” step (either as another tool or directly within the LLM’s prompt instructions) to distill the essential information before feeding it back to the main reasoning loop.
- Pitfall: If a tool returns complex JSON or natural language, the LLM might struggle to extract the relevant information for its next
Summary: Building Smarter, More Robust Agents
Congratulations! You’ve successfully navigated the exciting world of advanced agent architectures. Here’s a quick recap of the key takeaways:
- ReAct (Reason + Act): This powerful paradigm allows agents to dynamically reason (
Thought), take action (Action) using tools, and learn from the results (Observation) in an iterative loop. It’s fundamental for building agents that can adapt and use tools intelligently. - Reflection: By critically evaluating their past
Thoughts,Actions, andObservations, agents can identify mistakes, learn from them, and refine their strategies, leading to greater robustness and self-correction. - Iterative Planning-Execution Loops: This is the overarching framework where agents continuously plan, execute parts of the plan, monitor progress, evaluate, and re-plan. It’s essential for tackling complex, long-horizon tasks in dynamic environments.
- Practical Implementation: We built a simplified ReAct agent in Python, demonstrating how the
Thought -> Action -> Observationcycle works with mock LLMs and tools. This hands-on experience demystifies the core logic. - Common Pitfalls: We discussed challenges like prompt complexity, infinite loops, and tool output misinterpretation, along with strategies for troubleshooting them.
These advanced architectures are crucial for moving beyond simple, reactive AI systems to truly autonomous and intelligent agents capable of sophisticated problem-solving. As you continue your journey, remember that frameworks like LangChain, AutoGen, and Microsoft Agent Framework provide robust implementations of these patterns, allowing you to build on solid foundations.
Next, we’ll explore how multiple agents can work together, communicating and collaborating to solve problems that are too complex for a single agent!
References
- Microsoft Learn - Agentic AI tools for Windows development
- Microsoft Learn - Agent Framework documentation
- ReAct: Synergizing Reasoning and Acting in Language Models (Paper)
- LangChain Documentation (Agents)
- AutoGen Documentation (Agents)
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.