Context Management
Auto-summarization, token budgets, and structured state snapshots for long conversations.
Context Management
Supyagent's ContextManager ensures that conversations can run indefinitely without exceeding the model's context window. It keeps the full message history on disk while sending only a carefully trimmed subset to the LLM.
Strategy
The context management strategy has three layers:
- Full Persistence -- All messages are saved to disk as JSONL. Nothing is ever lost.
- Trimmed LLM Input -- When building messages for the LLM, the system includes: system prompt, optional context summary, and as many recent messages as fit in the token budget.
- Auto-Summarization -- When message count or token count exceeds configurable thresholds, older messages are compressed into a structured state snapshot.
Configuration
Configure context management in your agent YAML under the context key:
name: myagent
model:
provider: anthropic/claude-sonnet-4-5-20250929
context:
auto_summarize: true # Enable auto-summarization
max_messages_before_summary: 30 # Trigger after N messages since last summary
max_tokens_before_summary: 128000 # Trigger when total tokens exceed K
min_recent_messages: 6 # Always include at least this many recent messages
response_reserve: 4096 # Reserve tokens for the model's responseConfiguration Fields
| Field | Type | Default | Description |
|---|---|---|---|
auto_summarize | bool | true | Enable automatic summarization when thresholds are reached |
max_messages_before_summary | int | 30 | Trigger summarization after N messages since last summary |
max_tokens_before_summary | int | 128000 | Trigger summarization when total message tokens exceed this |
min_recent_messages | int | 6 | Minimum recent messages to always include (never summarized over) |
response_reserve | int | 4096 | Tokens reserved for the model response |
Summarization Triggers
Summarization is triggered when either threshold is reached (whichever comes first):
- Message count trigger (N): When the number of messages since the last summary reaches
max_messages_before_summary - Token count trigger (K): When the total token count of all messages exceeds
max_tokens_before_summary
Both triggers require a minimum number of messages (min_recent_messages + 4) before summarization can occur.
How the Summary Works
When summarization triggers, the system sends older messages to the LLM with a structured extraction prompt. The result is a structured state snapshot with these sections:
## Files Modified
- List each file path and what was changed
## Key Decisions
- List decisions made and their rationale
## Important Values
- Exact file paths, URLs, IDs, error messages, configuration values
## Current State
- What task is in progress
- What was the last action taken
- What is the expected next step
## Pending Tasks
- List remaining work itemsThis structured format preserves actionable context far better than a prose summary. The snapshot:
- Preserves exact file paths, URLs, IDs, error messages, and configuration values verbatim
- Retains tool results that contain data (not just
ok: trueconfirmations) - Keeps the snapshot under 600 words
- Focuses on state that the agent needs to continue working
Message Building for LLM
When the agent prepares messages for an LLM call, the ContextManager builds the list as follows:
- Start with the system prompt
- Subtract tool definition tokens from the available budget
- If a summary exists and fits within 30% of available tokens, include it
- Add recent messages from newest to oldest until the token budget is exhausted (always including at least
min_recent_messages) - Run a safety check -- if the total still exceeds the context limit, trigger emergency truncation
Emergency Truncation
If the message list still exceeds the context limit after normal building, the system performs emergency truncation:
- Keep the system prompt (always)
- Keep the summary message (if present)
- Keep the last
min_recent_messages - Strip images from multimodal content in middle messages
- Truncate large content blocks (keeping first 1000 and last 500 characters)
- Drop oldest non-protected messages one by one until within budget
Token Counting
Token counts are calculated using tiktoken (for OpenAI-compatible models) and model-specific context limits. The get_context_limit() function returns the correct context window size for each model.
The token budget is calculated as:
available = context_limit - response_reserve - system_prompt_tokens - tool_definition_tokensMonitoring Context Status
In Chat: /context Command
During an interactive chat session, use the /context command to see current context window usage:
Context Status
Context limit: 200,000 tokens
Tool definitions: 12,450 tokens (35 tools)
Last summary: 42 messages -> 580 tokens
Created: 2025-01-15 14:30
Summarization Triggers (N messages OR K tokens)
Messages: 18 / 30 (60%)
[████████████░░░░░░░░]
Tokens: 45,000 / 128,000 (35%)
[███████░░░░░░░░░░░░░]In Chat: /tokens Command
Toggle token usage display after each turn with /tokens. When enabled, you see:
tokens: 45,000 msgs + 12,450 tools | context: 57,450 / 200,000 (28%)In Chat: /summarize Command
Force summarization at any time with /summarize, regardless of whether thresholds have been met.
Summary Persistence
Summaries are saved to disk as JSON alongside session data. When a session is resumed, the existing summary is loaded automatically. This means you can close a chat, come back later, and the agent will still have the compressed context from the earlier conversation.
Tuning Guidelines
| Scenario | Recommended Settings |
|---|---|
| Short conversations (Q&A) | max_messages_before_summary: 50, max_tokens_before_summary: 200000 |
| Long coding sessions | max_messages_before_summary: 20, min_recent_messages: 8 |
| Large tool outputs | response_reserve: 8192, max_tokens_before_summary: 80000 |
| Small context models (8K-32K) | max_messages_before_summary: 10, max_tokens_before_summary: 20000, min_recent_messages: 4 |
| Large context models (200K+) | max_messages_before_summary: 50, max_tokens_before_summary: 150000 |
Related
- Memory -- Entity-graph memory for cross-session knowledge
- Prompt Caching -- Reduce costs when summaries are stable
- Configuration -- All configuration layers and options