Introduction to Dynamic Context
Welcome back, fellow AI engineers! In our previous chapters, we laid the groundwork for effective context engineering. We learned how to design context, reduce its size through summarization and filtering, compress it for efficiency, and chunk it into manageable pieces. These foundational techniques are crucial, but they primarily deal with static context – information that’s prepared once and then fed to the LLM.
But what about long-running conversations, persistent agents, or applications that need to maintain a “memory” over extended periods? The fixed context window of LLMs, while growing, still presents a significant challenge. This is where dynamic context management comes into play.
In this chapter, we’ll dive into advanced strategies that allow your LLM applications to intelligently adapt their context on the fly. We’ll explore two powerful techniques: context prioritization, which helps us decide what information is most important, and sliding windows, which help us manage how much information we retain over time. Mastering these concepts is essential for building robust, intelligent, and cost-effective LLM agents that can maintain coherence and relevance across many interactions.
By the end of this chapter, you’ll understand how to “own your context window” not just by filling it efficiently, but by actively managing its contents to keep your LLMs focused and performing optimally.
The Need for Dynamic Context Management
Imagine an LLM-powered assistant helping a user plan a complex trip over several days. The conversation might involve initial preferences, budget discussions, flight searches, hotel bookings, activity suggestions, and last-minute changes. If we simply feed the entire conversation history to the LLM each time, we’ll quickly hit the context window limit. Once that limit is reached, the oldest parts of the conversation are truncated, leading to what we call Context Rot.
What is Context Rot?
Context Rot occurs when critical, relevant information gradually leaves the LLM’s active context window because newer, often less important, information pushes it out. This can lead to:
- Loss of Coherence: The LLM “forgets” previous turns or key decisions.
- Reduced Accuracy: Inability to answer questions based on forgotten information.
- Suboptimal Performance: Repeating information or making suggestions that contradict earlier agreements.
- Increased Cost: If we try to mitigate by using larger models or larger context windows unnecessarily.
To combat context rot and enable long-running, intelligent agents, we need dynamic strategies. This means not just reducing context, but intelligently selecting and managing it as new information arrives.
Context Prioritization: Deciding What Matters Most
Context prioritization is the art and science of intelligently selecting the most relevant pieces of information to include in the LLM’s prompt at any given moment. Instead of simply taking the last N tokens, we actively decide which tokens carry the most weight for the current task or query.
Why Prioritize Context?
- Combat Context Rot: Ensures crucial information isn’t prematurely discarded.
- Improve Relevance: Keeps the LLM focused on the current task and user intent.
- Optimize Cost & Latency: By sending only necessary information, we reduce token usage and processing time.
- Enhance Accuracy: Fewer irrelevant distractions lead to better reasoning.
Techniques for Context Prioritization
Let’s explore common methods for prioritizing context. Often, the most effective solutions combine several of these techniques.
1. Recency-Based Prioritization
This is the simplest form of prioritization: newer information is generally more relevant. In a conversation, the most recent turns are often crucial for understanding the immediate context.
- How it works: Always keep the
Nmost recent messages/chunks. - Pros: Easy to implement, often effective for short-term interactions.
- Cons: Can still suffer from context rot if an important detail from much earlier is needed later.
2. Relevance Scoring (Semantic Search)
This is a powerful technique that uses embeddings to find information semantically similar to the current query or task. It’s the backbone of many Retrieval-Augmented Generation (RAG) systems.
- How it works:
- Embed the current user query or agent’s internal state into a vector.
- Embed all potential context chunks (e.g., past messages, retrieved documents) into vectors.
- Calculate the similarity (e.g., cosine similarity) between the query embedding and each context chunk embedding.
- Select the top
Kmost similar chunks to include in the prompt.
- Pros: Highly effective at bringing truly relevant information into context, even if it’s “old.”
- Cons: Requires an embedding model and a vector store, adds computational overhead.
3. Rule-Based / Heuristic Prioritization
Sometimes, domain-specific knowledge or explicit rules can be used to ensure certain types of information are always present or always excluded.
- How it works: Define rules like:
- “Always include the user’s primary goal statement.”
- “Exclude generic greetings or farewells.”
- “Prioritize messages containing keywords like ‘urgent’ or ‘critical’.”
- “If a message contains a ‘decision’ keyword, keep it longer.”
- Pros: Very flexible, can incorporate deep domain knowledge, doesn’t require complex ML models.
- Cons: Can be brittle if rules aren’t comprehensive, requires manual tuning.
4. Hybrid Approaches
The best systems often combine these methods. For example, you might:
- Always include a system prompt and the
Nmost recent user/assistant turns (recency). - Then, perform a semantic search over the rest of the conversation history or relevant documents to pull in additional, highly relevant chunks.
- Apply a final filter based on heuristics (e.g., ensure the user’s main objective is present).
This layered approach ensures both immediate conversational flow and deep contextual understanding.
Sliding Window Context Management
While prioritization helps us select relevant information, a sliding window helps us manage the volume of information over time, especially in continuous interactions like chatbots or agents with long-term memory.
What is a Sliding Window?
Think of a sliding window as a fixed-size container that moves along a continuous stream of data. As new data enters one end, old data falls out the other, ensuring that the container always holds the most recent (or most relevant) items up to its capacity.
Figure 6.1: Conceptual diagram of a sliding window moving over a data stream.
In the context of LLMs, the “data” is typically the conversation history, agent observations, or retrieved documents. The “window capacity” is often defined by the LLM’s maximum context window (minus space for system prompts, current query, etc.).
Types of Sliding Windows
1. Simple Recency Window
This is the most straightforward implementation. It simply keeps the N most recent messages or chunks.
- Mechanism: When a new message comes in, add it to the list. If the list exceeds
Nitems (or a total token limit), remove the oldest item(s) until it fits. - Use Case: Basic chatbots where only recent history is crucial.
- Trade-off: Easy to implement, but prone to context rot for important older information.
2. Summarization Window
To retain more long-term context without exceeding limits, older parts of the conversation can be periodically summarized.
- Mechanism:
- Maintain a full history.
- When the context approaches its limit, take a block of older messages.
- Send these older messages to the LLM with a prompt like “Summarize the following conversation history concisely.”
- Replace the original block of messages with the LLM’s summary.
- Add new messages to the window.
- Use Case: Agents requiring long-term memory or maintaining state over very long conversations.
- Trade-off: Retains more information, but summarization can lose fine-grained details and adds latency/cost for the summarization step.
3. Prioritized Sliding Window
This combines the best of both worlds: a sliding window mechanism that also incorporates prioritization logic.
- Mechanism:
- Maintain a pool of potential context items (e.g., all past messages).
- When constructing the prompt for the LLM, apply prioritization techniques (recency, relevance, rules) to select the most important items from the pool.
- Ensure the selected items fit within the context window. Old, irrelevant items are effectively “slid out” by not being chosen.
- Use Case: Sophisticated agents that need both long-term memory and dynamic focus on the current task.
- Trade-off: More complex to implement, requires careful tuning of prioritization logic, but offers superior performance.
Step-by-Step Implementation: A Prioritized Sliding Window in Python
Let’s build a simple Python class to demonstrate a prioritized sliding window. We’ll simulate a conversation history and implement a basic prioritization strategy.
First, let’s create a ContextItem class to represent individual pieces of context, such as user messages or agent responses.
# context_manager.py
import uuid
import datetime
class ContextItem:
"""Represents a single piece of context, like a message or event."""
def __init__(self, content: str, item_type: str, timestamp=None, metadata=None):
self.id = str(uuid.uuid4()) # Unique ID for the item
self.content = content
self.item_type = item_type # e.g., "user_message", "agent_response", "system_note"
self.timestamp = timestamp if timestamp else datetime.datetime.now(datetime.timezone.utc)
self.metadata = metadata if metadata else {} # For additional info like relevance score
def __str__(self):
return f"[{self.timestamp.strftime('%H:%M:%S')}] {self.item_type.capitalize()}: {self.content}"
def __repr__(self):
return f"ContextItem(type='{self.item_type}', content='{self.content[:30]}...')"
print("ContextItem class defined.")
Explanation:
- We import
uuidfor unique IDs anddatetimefor timestamps. ContextItemstorescontent,item_type(useful for prioritization),timestamp(for recency), andmetadata(for things like relevance scores).__str__and__repr__make it easy to print and debug these items.
Now, let’s build our PrioritizedSlidingWindow class. This class will maintain a history of ContextItem objects and select the most relevant ones for the LLM.
# context_manager.py (continued)
class PrioritizedSlidingWindow:
"""
Manages a dynamic context window, prioritizing items based on recency and a simple relevance score.
"""
def __init__(self, max_tokens: int, tokenizer):
self.max_tokens = max_tokens
self.tokenizer = tokenizer # A simple mock tokenizer for demonstration
self.history: list[ContextItem] = []
self.system_prompt: ContextItem | None = None
self.sticky_items: list[ContextItem] = [] # Placeholder for sticky items (for mini-challenge)
def add_system_prompt(self, prompt_content: str):
"""Sets a system prompt that is always prioritized."""
self.system_prompt = ContextItem(prompt_content, "system_prompt")
print(f"System prompt added: {prompt_content}")
def add_item(self, content: str, item_type: str):
"""Adds a new context item to the history."""
new_item = ContextItem(content, item_type)
self.history.append(new_item)
print(f"Added: {new_item}")
def _calculate_token_count(self, text: str) -> int:
"""
Mocks token calculation. In a real application, use a proper tokenizer
like `tiktoken` (for OpenAI models) or a model-specific tokenizer
to get accurate token counts. Simple `split()` is not sufficient
due to subwords, punctuation, and special characters.
"""
return len(self.tokenizer.encode(text))
def get_prioritized_context(self, current_query: str, num_recent_items: int = 5, keyword_priority: list[str] = None) -> list[str]:
"""
Retrieves a list of context strings, prioritized to fit within max_tokens.
Prioritization order: System Prompt -> Recent Items -> Keyword-matched Items -> Current Query.
"""
keyword_priority = keyword_priority if keyword_priority else []
context_strings: list[str] = []
current_token_count = 0
# 1. Always include system prompt if available
if self.system_prompt:
system_prompt_str = str(self.system_prompt)
tokens = self._calculate_token_count(system_prompt_str)
if current_token_count + tokens <= self.max_tokens:
context_strings.append(system_prompt_str)
current_token_count += tokens
else:
print("Warning: System prompt too large for context window.")
return [] # Cannot even fit system prompt, critical error
# (Mini-Challenge Hint: Sticky items would go here, after system prompt)
# 2. Add recent items (sliding window effect)
# We iterate backwards to get the most recent first, then insert them
# at the beginning (after any system/sticky prompts) to maintain chronological order.
recent_items = self.history[-num_recent_items:]
for item in reversed(recent_items):
item_str = str(item)
tokens = self._calculate_token_count(item_str)
if current_token_count + tokens <= self.max_tokens:
# Insert at the position after system prompt (and potential sticky items)
insert_idx = len(context_strings) if self.system_prompt else 0
context_strings.insert(insert_idx, item_str)
current_token_count += tokens
else:
print(f"Skipping recent item due to token limit: {item.content[:20]}...")
break # Window is full
# 3. Add keyword-matched items (simple relevance beyond recency)
# We'll search through older history that wasn't covered by recent_items
# To avoid adding duplicates, we filter out items already in context_strings.
# This is a simplified check; a more robust one would use item IDs.
current_context_content = {item.content for item in self.history if str(item) in context_strings}
older_history = [item for item in self.history if item.content not in current_context_content and item not in recent_items]
# Sort older history by recency so that among keyword matches, newer ones are preferred
older_history.sort(key=lambda x: x.timestamp, reverse=True)
for item in older_history:
# Simple keyword matching for demonstration: does the item content contain a priority keyword?
if any(keyword.lower() in item.content.lower() for keyword in keyword_priority):
item_str = str(item)
tokens = self._calculate_token_count(item_str)
if current_token_count + tokens <= self.max_tokens:
# Append these after system/recent items
context_strings.append(item_str)
current_token_count += tokens
else:
print(f"Skipping keyword-matched item due to token limit: {item.content[:20]}...")
break # Window is full
# Finally, add the current query (always last for the LLM call)
query_str = f"User Query: {current_query}"
query_tokens = self._calculate_token_count(query_str)
if current_token_count + query_tokens <= self.max_tokens:
context_strings.append(query_str)
current_token_count += query_tokens
else:
print("Warning: Current query cannot fit into context window!")
# This is a critical scenario. In a real app, you might truncate the query,
# summarize it, or return an error.
print(f"\n--- Current Context ({current_token_count}/{self.max_tokens} tokens) ---")
return context_strings
Explanation (cont.):
__init__: Initializes with amax_tokenslimit and atokenizer(we’ll use a mock for simplicity, but in a real app, this would betiktokenor similar).historystores allContextItems.sticky_itemsis added as a placeholder for the mini-challenge.add_system_prompt: Allows setting a “sticky” system prompt that is always attempted to be included.add_item: Appends new messages/events to thehistory. This is where the continuous stream of data comes in._calculate_token_count: A placeholder for a real tokenizer. Fortiktoken(recommended for OpenAI models), you’d usetokenizer.encode(text). We’ve added a crucial note about why real tokenizers are important.get_prioritized_context: This is the core logic:- System Prompt First: It prioritizes the
system_promptto ensure core instructions are always there. - Recent Items: It then attempts to add the
num_recent_itemsfrom the end of thehistory. This creates the “sliding window” effect, favoring newer interactions. Thecontext_strings.insert(insert_idx, item_str)line ensures these recent items are placed immediately after the system prompt (and any future sticky items), maintaining a logical flow. - Keyword-Matched Items: For items older than the recent window and not already included, it checks if their content contains any of the
keyword_priorityterms. This simulates a very basic relevance score. These are appended, appearing after recent items. - Token Limit Check: At each step, it checks
current_token_countagainstmax_tokensto prevent exceeding the LLM’s capacity. - Current Query Last: The
current_query(which is the user’s new input that the LLM needs to respond to) is always added at the very end of the prompt, as it’s the most immediate input. It’s important to note that thecurrent_queryis handled as part of the current prompt construction and typically added to thehistoryafter the LLM has processed it and generated a response.
- System Prompt First: It prioritizes the
Let’s simulate a basic Tokenizer and run an example!
# context_manager.py (continued - example usage)
# Mock Tokenizer for demonstration purposes
class MockTokenizer:
def encode(self, text: str) -> list[str]:
return text.split() # Simple split by space, not accurate for real tokens
def decode(self, tokens: list[str]) -> str:
return " ".join(tokens)
# Initialize our context manager
mock_tokenizer = MockTokenizer()
max_llm_tokens = 70 # A small context window for easy demonstration, but larger than before
context_manager = PrioritizedSlidingWindow(max_tokens=max_llm_tokens, tokenizer=mock_tokenizer)
# Add a system prompt
context_manager.add_system_prompt("You are a helpful travel assistant. Always prioritize user safety.")
# Simulate a conversation history
context_manager.add_item("I want to plan a trip to Paris.", "user_message")
context_manager.add_item("Great! When are you planning to travel?", "agent_response")
context_manager.add_item("Remember, safety is my top priority for this trip.", "user_message") # Older, keyword-matched item
context_manager.add_item("Mid-October, but I'm worried about flight costs.", "user_message")
context_manager.add_item("I can help you find affordable flights. What's your budget?", "agent_response")
context_manager.add_item("Around $800 for flights. Also, I need pet-friendly accommodation.", "user_message") # Recent item
context_manager.add_item("Okay, pet-friendly accommodation noted. Let's look at flights first.", "agent_response") # Recent item
context_manager.add_item("Found some flights for $750. Does that work?", "agent_response") # Recent item
# Current user query for which we need to build context
current_query = "Yes, that's perfect for flights! Now, what about hotels? Remember the pet."
keywords_to_prioritize = ["pet-friendly", "accommodation", "hotels", "safety"]
# Get prioritized context for the LLM to respond to the current_query
# We'll use num_recent_items=3 to show the sliding window effect
context_for_llm = context_manager.get_prioritized_context(
current_query=current_query,
num_recent_items=3,
keyword_priority=keywords_to_prioritize
)
print("\n--- Context sent to LLM ---")
for line in context_for_llm:
print(line)
# Simulate adding the current query and a hypothetical agent response to history
# This happens *after* the LLM has processed the context and generated its response.
print("\n--- Simulating conversation progression ---")
context_manager.add_item(current_query, 'user_message')
context_manager.add_item("Excellent! I'll now search for pet-friendly hotels in Paris. What's your nightly budget?", "agent_response")
print("\n--- Full History after progression for comparison ---")
for item in context_manager.history:
print(item)
Expected Output (timestamps will vary):
ContextItem class defined.
System prompt added: You are a helpful travel assistant. Always prioritize user safety.
Added: [HH:MM:SS] User_message: I want to plan a trip to Paris.
Added: [HH:MM:SS] Agent_response: Great! When are you planning to travel?
Added: [HH:MM:SS] User_message: Remember, safety is my top priority for this trip.
Added: [HH:MM:SS] User_message: Mid-October, but I'm worried about flight costs.
Added: [HH:MM:SS] Agent_response: I can help you find affordable flights. What's your budget?
Added: [HH:MM:SS] User_message: Around $800 for flights. Also, I need pet-friendly accommodation.
Added: [HH:MM:SS] Agent_response: Okay, pet-friendly accommodation noted. Let's look at flights first.
Added: [HH:MM:SS] Agent_response: Found some flights for $750. Does that work?
--- Current Context (68/70 tokens) ---
[HH:MM:SS] System_prompt: You are a helpful travel assistant. Always prioritize user safety.
[HH:MM:SS] User_message: Around $800 for flights. Also, I need pet-friendly accommodation.
[HH:MM:SS] Agent_response: Okay, pet-friendly accommodation noted. Let's look at flights first.
[HH:MM:SS] Agent_response: Found some flights for $750. Does that work?
[HH:MM:SS] User_message: Remember, safety is my top priority for this trip.
User Query: Yes, that's perfect for flights! Now, what about hotels? Remember the pet.
--- Simulating conversation progression ---
Added: [HH:MM:SS] User_message: Yes, that's perfect for flights! Now, what about hotels? Remember the pet.
Added: [HH:MM:SS] Agent_response: Excellent! I'll now search for pet-friendly hotels in Paris. What's your nightly budget?
--- Full History after progression for comparison ---
[HH:MM:SS] User_message: I want to plan a trip to Paris.
[HH:MM:SS] Agent_response: Great! When are you planning to travel?
[HH:MM:SS] User_message: Remember, safety is my top priority for this trip.
[HH:MM:SS] User_message: Mid-October, but I'm worried about flight costs.
[HH:MM:SS] Agent_response: I can help you find affordable flights. What's your budget?
[HH:MM:SS] User_message: Around $800 for flights. Also, I need pet-friendly accommodation.
[HH:MM:SS] Agent_response: Okay, pet-friendly accommodation noted. Let's look at flights first.
[HH:MM:SS] Agent_response: Found some flights for $750. Does that work?
[HH:MM:SS] User_message: Yes, that's perfect for flights! Now, what about hotels? Remember the pet.
[HH:MM:SS] Agent_response: Excellent! I'll now search for pet-friendly hotels in Paris. What's your nightly budget?
Observation:
Notice how the --- Context sent to LLM --- section contains:
- The
system_prompt(always first). - The three most recent items from the conversation history (due to
num_recent_items=3). - An older
user_message(“Remember, safety is my top priority…”) that was pulled in because it matched akeyword_priority(“safety”), even though it wasn’t among the most recent. This demonstrates the keyword-based prioritization. - The
current_query(always last).
The total token count (68) is close to our max_llm_tokens (70), showing how the system carefully selects items to fit the window. This demonstrates a successful application of prioritized sliding window logic. The “Simulating conversation progression” part correctly shows how the current_query and the agent’s response would then be added to the full history for the next turn.
Mini-Challenge: “Sticky” Context Items
Your challenge is to enhance the PrioritizedSlidingWindow class.
Challenge:
Modify the PrioritizedSlidingWindow to allow for “sticky” context items. These are items that, once added, should always be included in the context if there is space, regardless of recency or keyword relevance, after the system prompt but before other dynamic items (like recent or keyword-matched history). Think of them as crucial facts or decisions that the agent should never forget, like “The user’s preferred language is Spanish” or “The user’s budget for hotels is $150/night.”
You’ll need to:
- Implement an
add_sticky_item(self, content: str, item_type: str)method that addsContextItems to theself.sticky_itemslist you saw in__init__. - Modify the
get_prioritized_contextmethod to iterate throughself.sticky_itemsand include them incontext_stringsafter thesystem_promptbut before therecent_itemsandkeyword-matched items. Remember to check token limits for these too!
Hint:
When adding sticky items, you’ll need to use context_strings.append(sticky_item_str) after the system prompt is added, but before the loop for recent_items. Ensure you handle the token count correctly.
What to Observe/Learn: How to combine truly fixed context (system prompt), semi-static/sticky context, and dynamic context (recent/prioritized) within a single context management strategy. This is crucial for maintaining core objectives or user preferences over long interactions.
Common Pitfalls & Troubleshooting
Even with dynamic context management, challenges can arise.
- Over-Prioritization Leading to Loss of Nuance: If your relevance scoring or keyword prioritization is too aggressive, it might exclude seemingly less “relevant” but contextually crucial information. For example, a subtle emotional cue or a background detail might be missed.
- Troubleshooting: Use a tiered approach (e.g., always include recent, then semantic search a broader pool). Experiment with different
num_recent_itemsandkeyword_prioritythresholds. Consider summarization for older, less critical chunks to retain gist without detail.
- Troubleshooting: Use a tiered approach (e.g., always include recent, then semantic search a broader pool). Experiment with different
- Incorrect Window Sizing: A window that’s too small will lead to constant context rot. A window that’s too large might be inefficient and costly, especially if filled with irrelevant content.
- Troubleshooting: Profile your application. Monitor token usage. Start with a window size that comfortably fits a few turns of conversation plus your system prompt, then incrementally increase/decrease based on observed performance and LLM output quality. Remember to account for the LLM’s own response tokens!
- Performance Overhead of Complex Prioritization: Calculating embeddings and performing similarity searches for every turn can add noticeable latency, especially with a large history.
- Troubleshooting: Cache embeddings. Use approximate nearest neighbor (ANN) search for large vector stores. Only re-prioritize when necessary (e.g., every few turns, or if a specific keyword is detected). Consider using a simpler prioritization strategy for less critical paths.
- Contextual Drift: Even with prioritization, if the conversation topic shifts significantly, the LLM might struggle to adapt if the “old” context still heavily influences its behavior, even if it’s technically “relevant” to an earlier part of the conversation.
- Troubleshooting: Implement explicit topic detection or conversation phase management. When a major topic shift occurs, consider resetting or significantly re-prioritizing the context to focus on the new direction. Summarize previous topics more aggressively.
Summary
This chapter has equipped you with powerful techniques for dynamic context management, moving beyond static context preparation to truly “own your context window” in real-time.
Here are the key takeaways:
- Context Rot is a major challenge for long-running LLM applications, leading to loss of coherence and accuracy.
- Context Prioritization is about intelligently selecting the most relevant information for the LLM’s prompt. Techniques include recency, relevance scoring (semantic search), and rule-based/heuristic methods.
- Sliding Window Context Management provides a mechanism to maintain a fixed-size context over a continuous stream of information, ensuring newer (or more relevant) data is always present.
- Types of sliding windows include simple recency, summarization windows, and prioritized sliding windows, which combine dynamic selection with windowing.
- Implementing a prioritized sliding window involves carefully adding system prompts, recent items, and contextually relevant items while staying within token limits. The
current_queryis then appended to this constructed context. - Common pitfalls include over-prioritization, incorrect window sizing, performance overhead, and contextual drift.
By mastering prioritization and sliding windows, you can build more robust, intelligent, and efficient LLM agents that maintain a coherent “memory” and stay focused on the user’s intent, even across extended interactions.
In the next chapter, we’ll expand our horizons even further by exploring Multi-Source Context Pipelines, specifically focusing on Retrieval-Augmented Generation (RAG) to bring external knowledge into our LLM’s context. Get ready to connect your LLMs to the vast world of data!
References
- 12-Factor Agents - Factor 3: Own Your Context Window (Accessed 2026-03-20)
- yzfly/awesome-context-engineering - GitHub (Accessed 2026-03-20)
- context-engineering/README.md at main - GitHub (Accessed 2026-03-20)
- OpenAI
tiktokenlibrary for token counting (Accessed 2026-03-20)
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.