Deconstructing Agentic AI: LLM, Memory, Tools, and Planning

Introduction

Welcome back, intrepid developer! In our previous chapters, you’ve mastered the art of crafting precise and powerful prompts, turning Large Language Models (LLMs) into capable text generators. But what if we want LLMs to do more than just generate text? What if we want them to act in the world, to remember past interactions, and to strategically use external resources to solve complex problems?

This is where Agentic AI comes into play. Instead of just a single prompt-response interaction, agentic systems empower LLMs with a “body” and “mind” beyond their text generation core. They can perceive, plan, act, and reflect, much like a human. This chapter will be your deep dive into the fundamental architecture of these intelligent agents. We’ll deconstruct them into their core components: the LLM itself, memory, tools, and the planning mechanism that orchestrates everything.

By the end of this chapter, you’ll not only understand the theory behind agentic design but also begin to build your first simple agent, connecting its “brain” to external capabilities. Get ready to transform your LLM applications from static response generators to dynamic, problem-solving entities!

Core Concepts: The Anatomy of an AI Agent

An AI agent, at its heart, is an LLM enhanced with additional capabilities that allow it to interact dynamically with its environment. Think of it like giving a super-smart brain (the LLM) a memory, hands (tools), and the ability to think through problems (planning).

Let’s break down these crucial components:

1. The LLM: The Agent’s Brain

The Large Language Model is the central processing unit, the “brain” of our agent. It’s responsible for understanding natural language, reasoning, and generating responses. In an agentic setup, the LLM doesn’t just answer questions; it interprets observations, makes decisions, and generates action plans.

What it is: A powerful neural network trained on vast amounts of text data, capable of understanding and generating human-like text.
Why it’s important: It provides the core intelligence for reasoning, decision-making, and natural language interaction. Without it, the agent can’t understand tasks or formulate plans.
How it functions in an agent: The LLM receives observations (user input, tool outputs, memory contents), processes them, and then outputs a thought, a decision, or an action to take. This often involves specific prompt engineering to guide its reasoning process (e.g., Chain-of-Thought).

2. Memory: Remembering the Past, Informing the Future

Just like humans, agents need memory to maintain context, learn from past interactions, and avoid repeating mistakes. Without memory, an agent would be stateless, treating every interaction as entirely new, which severely limits its utility for multi-turn conversations or complex, sequential tasks.

We typically categorize agent memory into two main types:

Short-Term Memory (Context Window)

What it is: The immediate context passed directly into the LLM’s prompt. It’s the most recent conversation turns, observations, and tool outputs.
Why it’s important: It allows the LLM to maintain a coherent conversation and understand the immediate task at hand.
How it functions: The framework dynamically builds a prompt that includes the system message, the agent’s persona, a history of recent interactions, and the current user query. This is often limited by the LLM’s context window size.

Long-Term Memory (External Storage)

What it is: Persistent storage outside the LLM’s context window, typically implemented using vector databases, traditional databases, or knowledge graphs. This stores information that’s too large or too old to fit in the short-term context.
Why it’s important: It enables agents to recall information from much earlier interactions, access a vast knowledge base (like a company’s internal documentation), and learn over time. This is crucial for personalization, knowledge retention, and avoiding context window limits.
How it functions: When the agent needs information that might be in long-term memory, it can use an embedding model to convert its query into a numerical vector. This vector is then used to search the vector database for semantically similar chunks of information. The retrieved information is then injected into the LLM’s short-term context. This process is a cornerstone of Retrieval-Augmented Generation (RAG), which we explored in earlier chapters.

3. Tools: Interacting with the World

Tools are the “hands” and “senses” of an agent. They allow the LLM to perform actions beyond generating text, such as searching the web, querying a database, calling an API, or even sending an email. Tools bridge the gap between the LLM’s internal reasoning and the external world.

What they are: Functions or APIs that the agent can call. Each tool has a specific purpose, a clear name, and a description that tells the LLM when and how to use it, along with its expected input parameters.
Why they’re important: They provide agents with practical capabilities, enabling them to gather real-time information, perform calculations, or trigger actions in other systems. Without tools, agents are confined to their training data.
How they function: The LLM, based on its planning, decides which tool to use. It then generates the necessary arguments for that tool. The agent execution environment calls the tool with these arguments, and the tool’s output is returned to the LLM as an observation, which then informs the agent’s next thought or action.

4. Planning: The Agent’s Strategy

Planning is the agent’s ability to reason, break down complex tasks into smaller steps, decide which tools to use, and determine the sequence of actions required to achieve a goal. This is where advanced prompt engineering techniques, like Chain-of-Thought, become absolutely vital.

What it is: The iterative process by which the LLM analyzes a problem, formulates a strategy, executes steps, observes results, and course-corrects.
Why it’s important: It allows agents to tackle multi-step problems, recover from errors, and adapt to dynamic environments. Without planning, an agent would simply guess or follow a rigid, pre-defined script.
How it functions: The LLM receives a prompt that encourages it to “think step-by-step.” It might output a Thought, then an Action (with Action Input), receive an Observation (from a tool), and then Thought again, repeating this cycle until it reaches a Final Answer. This is often referred to as the “Reasoning-Action” loop or “ReAct” pattern.

The Agentic Loop: Putting It All Together

These components work in a continuous cycle, often called the “Perceive-Plan-Act-Reflect” loop.

Let’s visualize this core loop:

graph TD User_Input[User Input/Goal] --> Agent_LLM Agent_LLM -->|Perceive & Plan| Thought_Action[Thought & Action] Thought_Action --> Use_Tool{Use Tool?} Use_Tool -->|Yes| Tool_Execution[Execute Tool] Tool_Execution -->|Observation| Agent_LLM Use_Tool -->|No, Final Answer| Agent_Output[Final Answer] Agent_LLM -.->|Accesses| Short_Term_Memory[Short-Term Memory] Agent_LLM -.->|Queries/Stores| Long_Term_Memory[Long-Term Memory] Short_Term_Memory -.-> Agent_LLM Long_Term_Memory -.-> Agent_LLM subgraph Agent_Components["Agent Core Components"] Agent_LLM[LLM - The Brain] Short_Term_Memory Long_Term_Memory Tool_Execution[Tools - The Hands] end

Explanation of the Agentic Loop:

User Input/Goal: The process starts with a user providing a task or query to the agent.
LLM - The Brain (Perceive & Plan): The LLM receives the input, along with relevant context from short-term and potentially long-term memory. It then perceives the situation and plans its next step. This planning often involves generating a “thought” process, deciding if a tool is needed, and if so, which one and with what arguments.
Thought & Action: The LLM articulates its reasoning (Thought) and proposes an action (e.g., calling a tool, or formulating a final answer).
Use Tool?: The agent runtime checks if the LLM has decided to use a tool.
Execute Tool: If a tool is chosen, the agent runtime executes the specified tool with the LLM-generated arguments.
Observation: The output or result from the tool execution is returned to the LLM as an “observation.” This observation becomes new input for the LLM.
Loop Back: The LLM incorporates this new observation into its context and continues the “Perceive & Plan” cycle, refining its strategy until it believes the goal is achieved.
Final Answer: When the LLM determines it has completed the task, it generates a “Final Answer” to the user.
Memory Access: Throughout this process, the LLM continuously accesses and updates both short-term (context window) and long-term (external storage) memory to maintain state and gather information.

This iterative loop is what gives agents their dynamic and adaptive capabilities.

Step-by-Step Implementation: Building a Simple Agent with LangChain

Now that we understand the theory, let’s get hands-on! We’ll use LangChain, a popular framework, to demonstrate how these components come together. LangChain (and similar frameworks like LlamaIndex and AutoGen) simplifies the orchestration of LLMs, tools, and memory, allowing us to focus on agent logic.

CRITICAL NOTE (2026-04-06): LangChain’s API has evolved significantly. We’ll be using the modular langchain-core, langchain-community, and langchain-openai packages, which is the modern approach. Ensure your environment is set up correctly.

Prerequisites:

Before we start, make sure you have:

Python 3.10+ installed.
An OpenAI API key (or similar LLM provider like Anthropic, Google Cloud AI). You’ll need to set it as an environment variable.
A basic understanding of pip for package installation.

Let’s set up our environment.

Step 1: Environment Setup

First, create a new directory for our project and install the necessary packages.

Create Project Directory:
```
mkdir my_first_agent
cd my_first_agent
```
Create a Virtual Environment (Best Practice):
```
python -m venv .venv
```
On Windows:
```
.venv\Scripts\activate
```
On macOS/Linux:
```
source .venv/bin/activate
```
You should see (.venv) at the start of your command prompt, indicating the virtual environment is active.
Install LangChain and OpenAI: As of 2026-04-06, the recommended way to install LangChain is modularly. We’ll install langchain (which includes langchain-core), langchain-openai for OpenAI LLM integration, and langchain-community for various community-contributed components like tools.
```
pip install langchain==0.1.13 langchain-openai==0.0.8 langchain-community==0.0.29
# Note: Version numbers are illustrative for 2026-04-06. Always check PyPI for the absolute latest stable.
# For instance, `langchain` might be `0.2.x` by this date.
```
Set Your API Key: The most secure way is to set it as an environment variable. Replace YOUR_OPENAI_API_KEY_HERE with your actual key.
On macOS/Linux:
```
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY_HERE"
```
On Windows (PowerShell):
```
$env:OPENAI_API_KEY="YOUR_OPENAI_API_KEY_HERE"
```
On Windows (Command Prompt):
```
set OPENAI_API_KEY="YOUR_OPENAI_API_KEY_HERE"
```
For development, you can also put it directly in your Python code for quick testing, but this is not recommended for production.
```
# Not recommended for production, but useful for quick local tests
import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"
```

Step 2: Define a Simple Tool

Let’s create a tool that our agent can use. For this example, we’ll make a tool that simulates a “word count” function.

Create a file named agent_app.py.

# agent_app.py
from langchain_core.tools import tool

# 1. Define a tool
@tool
def get_word_count(text: str) -> int:
    """Calculates the number of words in a given text."""
    print(f"DEBUG: Executing get_word_count with text: '{text[:30]}...'") # Debug print
    return len(text.split())

# We'll add more code here later!

Explanation:

from langchain_core.tools import tool: We import the tool decorator from langchain-core, which is the base package for LangChain.
@tool: This decorator transforms our regular Python function get_word_count into a LangChain tool. LangChain automatically infers the tool’s name, description (from the docstring), and input parameters (from type hints).
text: str -> int: We use Python type hints to tell the LLM what kind of input the tool expects (str) and what kind of output it produces (int). This is crucial for the LLM to understand how to use the tool correctly.
"""Calculates the number of words in a given text.""": The docstring becomes the tool’s description, which the LLM uses to decide when to invoke this tool. Make these descriptions very clear!
print(f"DEBUG: ..."): A simple debug print to show when the tool is actually being called.

Step 3: Instantiate the LLM and Agent

Now, let’s bring in our LLM and set up a basic agent executor. We’ll use OpenAI’s ChatOpenAI model. For production, consider gpt-4-turbo-2024-04-09 or gpt-3.5-turbo-0125 for cost-effectiveness.

Add the following to agent_app.py:

# agent_app.py (continued)
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts import MessagesPlaceholder

# ... (previous get_word_count tool definition) ...

# 2. Instantiate the LLM
# Use a specific, recent model for production. 'gpt-4o' is a strong choice as of 2026-04-06.
llm = ChatOpenAI(model="gpt-4o", temperature=0) # temperature=0 for deterministic behavior

# 3. Define the Agent's Prompt
# This prompt guides the LLM to act as an agent using the ReAct pattern.
# It includes placeholders for agent scratchpad (thoughts, actions, observations) and chat history.
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant that can count words in text."),
        MessagesPlaceholder("chat_history", optional=True), # For memory, if we add it later
        ("human", "{input}"),
        MessagesPlaceholder("agent_scratchpad"), # This is where the ReAct loop happens
    ]
)

# 4. Create the ReAct Agent
# The `create_react_agent` function sets up the ReAct pattern (Thought, Action, Action Input, Observation).
agent = create_react_agent(llm, [get_word_count], prompt)

# 5. Create the Agent Executor
# The AgentExecutor is the runtime that takes the agent's decisions and executes them.
agent_executor = AgentExecutor(agent=agent, tools=[get_word_count], verbose=True)

# 6. Run the agent!
if __name__ == "__main__":
    print("Agent is ready! Type 'exit' to quit.")
    while True:
        user_input = input("\nHuman: ")
        if user_input.lower() == "exit":
            break
        try:
            # The agent_executor expects an 'input' key in the dictionary
            response = agent_executor.invoke({"input": user_input})
            print(f"Agent: {response['output']}")
        except Exception as e:
            print(f"An error occurred: {e}")

Explanation of new code:

from langchain_openai import ChatOpenAI: Imports the ChatOpenAI class for interacting with OpenAI’s chat models.
from langchain.agents import AgentExecutor, create_react_agent: These are core components for creating agents in LangChain. AgentExecutor runs the agent, and create_react_agent helps set up the ReAct prompting pattern.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder: Used to construct the prompt that guides the LLM.
llm = ChatOpenAI(model="gpt-4o", temperature=0): We initialize our LLM. gpt-4o (or gpt-4-turbo-2024-04-09) is a good choice for agents due to its strong reasoning capabilities. temperature=0 makes the output more deterministic, which is often preferred for agent actions.
Agent’s Prompt: This is crucial!
- ("system", "You are a helpful assistant..."): Sets the persona and initial instructions.
- MessagesPlaceholder("chat_history", optional=True): This placeholder is where conversational memory would go if we implemented it. We’ve marked it optional=True for now.
- ("human", "{input}"): This is where the user’s current query will be inserted.
- MessagesPlaceholder("agent_scratchpad"): This is the magic! LangChain dynamically populates this with the agent’s internal thought process (Thought, Action, Action Input, Observation) to guide the LLM through the ReAct loop.
agent = create_react_agent(llm, [get_word_count], prompt): This function combines our LLM, the list of tools it can use, and the guiding prompt to create an agent that follows the ReAct pattern.
agent_executor = AgentExecutor(agent=agent, tools=[get_word_count], verbose=True): The AgentExecutor is the engine that runs our agent.
- agent=agent: The agent logic itself.
- tools=[get_word_count]: The list of tools the executor can actually call. This must match the tools given to create_react_agent.
- verbose=True: This is incredibly useful for debugging! It prints out the agent’s internal thought process (the agent_scratchpad content), showing us when it thinks, acts, and observes.
agent_executor.invoke({"input": user_input}): We invoke the agent executor with the user’s input. The invoke method is part of LangChain’s Runnable interface.

Step 4: Run and Observe

Save agent_app.py and run it from your terminal:

python agent_app.py

Now, try typing something like:

Human: How many words are in the sentence "The quick brown fox jumps over the lazy dog"?

Expected Output (with verbose=True):

You’ll see a detailed log from the AgentExecutor:

> Entering new AgentExecutor chain...
Thought: The user is asking to count the words in a given sentence. The `get_word_count` tool is suitable for this task. I need to extract the sentence and pass it as an argument to the tool.
Action: get_word_count
Action Input: The quick brown fox jumps over the lazy dog
DEBUG: Executing get_word_count with text: 'The quick brown fox jumps ...'
Observation: 9
Thought: I have successfully used the `get_word_count` tool and received the word count, which is 9. I can now provide the final answer to the user.
Final Answer: There are 9 words in the sentence "The quick brown fox jumps over the lazy dog".

> Finished chain.
Agent: There are 9 words in the sentence "The quick brown fox jumps over the lazy dog".

Notice how the agent:

Thought: Understood the request and identified the get_word_count tool.
Action: Declared its intention to use get_word_count.
Action Input: Provided the sentence as input to the tool.
Observation: Received the 9 from our tool.
Thought: Interpreted the observation and formulated a final answer.
Final Answer: Presented the result to the user.

This is the core agentic loop in action! It’s not just generating text; it’s reasoning about a problem, using a tool, and then integrating the tool’s output to provide a solution.

Mini-Challenge: Enhance Your Agent with Another Tool

Let’s make our agent a bit more versatile!

Challenge: Add a new tool to your agent_app.py that can perform a simple arithmetic operation (e.g., addition or multiplication). The agent should be able to use either the get_word_count tool or your new arithmetic tool based on the user’s query.

Hints:

Define a new function with the @tool decorator.
Give it a clear docstring (description) and appropriate type hints for its parameters (e.g., num1: float, num2: float).
Remember to add your new tool to both the create_react_agent call (the agent’s brain needs to know about it) and the AgentExecutor call (the executor needs to be able to run it).
Test with queries like: “What is 5 plus 7?” or “How many words in ‘hello world’ multiplied by 3?” (though the agent won’t combine tools yet, it should choose one).

What to observe/learn:

How does the LLM decide which tool to use when multiple are available?
How important are clear tool descriptions in guiding the LLM’s choices?

Common Pitfalls & Troubleshooting

Building agents can be incredibly powerful, but also introduces new complexities. Here are a few common issues and how to tackle them:

Tool Selection Errors (Agent Hallucinating Tool Use):
- Pitfall: The agent tries to use a non-existent tool, or uses the wrong tool, or passes incorrect arguments to a tool.
- Reason: The LLM’s understanding of your tool’s description might be ambiguous, or its reasoning might be flawed.
- Troubleshooting:
  - Refine Tool Descriptions: Make your tool docstrings as clear, concise, and unambiguous as possible. Explicitly state its purpose, inputs, and outputs.
  - Specific Prompts: Adjust your system message in the ChatPromptTemplate to guide the agent more explicitly on when and how to use tools.
  - Model Choice: More capable LLMs (like gpt-4o or claude-3-opus) are generally better at tool use and complex reasoning.
  - verbose=True: Always use verbose=True in your AgentExecutor to see the agent’s internal Thought process. This is your most powerful debugging tool!
Context Window Limitations & “Forgetting”:
- Pitfall: In longer conversations or multi-step tasks, the agent seems to “forget” previous parts of the interaction.
- Reason: The LLM’s context window has a finite size. Old messages get pushed out as new ones come in. Our current example doesn’t use chat_history effectively.
- Troubleshooting:
  - Implement Conversational Memory: For multi-turn chats, you’ll need to integrate ConversationBufferMemory or similar memory components from LangChain (which we’ll cover in a later chapter!). This ensures relevant past messages are included.
  - Summarization: For very long histories, summarize past interactions before passing them to the LLM.
  - RAG (Long-Term Memory): For knowledge-heavy tasks, use Retrieval-Augmented Generation to fetch relevant information from a vector database and inject it into the prompt, rather than trying to stuff everything into the context.
Cost Overruns:
- Pitfall: Agentic workflows can quickly become expensive due to multiple LLM calls per interaction (thought, action, observation, thought, final answer).
- Reason: Each Thought, Action, and Final Answer step typically involves a separate API call to the LLM.
- Troubleshooting:
  - Monitor Usage: Use your LLM provider’s dashboard to track API usage and costs.
  - Optimize Prompts: Make prompts concise to reduce token count.
  - Model Selection: Use cheaper, faster models (e.g., gpt-3.5-turbo) for simpler agent tasks or for initial iterations, switching to more powerful models only when necessary.
  - Caching: Implement caching for repeated LLM calls, especially for tools that query static data.
  - Tool Efficiency: Ensure your tools are efficient and don’t require excessive LLM calls themselves.

Remember, agent development is highly iterative. Start simple, observe the agent’s behavior (verbose=True is your friend!), identify shortcomings, and then refine your tools, prompts, and memory strategies.

Summary

Phew! You’ve just taken a significant leap from basic prompt engineering to understanding the fundamental architecture of intelligent AI agents. Let’s recap the key takeaways:

Agentic AI extends LLMs: Agents empower LLMs to go beyond text generation, enabling them to perceive, plan, act, and reflect in dynamic environments.
Four Core Components:
- LLM (Brain): The reasoning engine that understands, plans, and generates.
- Memory (State): Crucial for maintaining context. This includes short-term (context window) and long-term (external storage like vector DBs).
- Tools (Hands): External functions or APIs that allow the agent to interact with the real world (web search, databases, custom APIs).
- Planning (Strategy): The iterative process where the LLM breaks down tasks, decides on actions, and course-corrects, often guided by patterns like ReAct.
The Agentic Loop: Agents operate in a continuous cycle of perceiving input, planning an action, executing that action (often via a tool), observing the results, and then refining their plan.
Frameworks are Key: Tools like LangChain simplify the orchestration of these complex components, allowing developers to focus on defining tools and guiding agent behavior.
Debugging is Essential: Using verbose=True and carefully crafting tool descriptions are vital for understanding and improving agent performance.

You’ve successfully built and run a basic agent that can use an external tool! This is a monumental step. In the next chapters, we’ll dive deeper into more sophisticated agent frameworks, explore advanced tool design, and implement robust memory management strategies to build even more capable and production-ready AI agents.

References

dair-ai/Prompt-Engineering-Guide (GitHub): A comprehensive resource covering prompt engineering techniques, many of which are foundational for agent planning.
- https://github.com/dair-ai/prompt-engineering-guide
LangChain Official Documentation: The primary source for understanding LangChain’s components, agents, tools, and memory. Always refer to the latest documentation.
- https://python.langchain.com/docs/
OpenAI API Documentation: Essential for understanding LLM models, API usage, and best practices for integrating with OpenAI’s services.
- https://platform.openai.com/docs/
ReAct: Synergizing Reasoning and Acting in Language Models (Paper): The foundational paper introducing the ReAct pattern, which is widely used in agentic frameworks.
- https://react-lm.github.io/
LlamaIndex Official Documentation: Another powerful framework for building LLM applications, particularly strong in data ingestion and retrieval for long-term memory.
- https://docs.llamaindex.ai/en/stable/

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.