Auto-summarization, token budgets, and structured state snapshots for long conversations.

Context Management

Supyagent's ContextManager ensures that conversations can run indefinitely without exceeding the model's context window. It keeps the full message history on disk while sending only a carefully trimmed subset to the LLM.

Strategy

The context management strategy has three layers:

Full Persistence -- All messages are saved to disk as JSONL. Nothing is ever lost.
Trimmed LLM Input -- When building messages for the LLM, the system includes: system prompt, optional context summary, and as many recent messages as fit in the token budget.
Auto-Summarization -- When message count or token count exceeds configurable thresholds, older messages are compressed into a structured state snapshot.

Configuration

Configure context management in your agent YAML under the context key:

agents/myagent.yaml

name: myagent
model:
  provider: anthropic/claude-sonnet-4-5-20250929

context:
  auto_summarize: true                  # Enable auto-summarization
  max_messages_before_summary: 30       # Trigger after N messages since last summary
  max_tokens_before_summary: 128000     # Trigger when total tokens exceed K
  min_recent_messages: 6                # Always include at least this many recent messages
  response_reserve: 4096                # Reserve tokens for the model's response

Configuration Fields

Field	Type	Default	Description
`auto_summarize`	bool	`true`	Enable automatic summarization when thresholds are reached
`max_messages_before_summary`	int	`30`	Trigger summarization after N messages since last summary
`max_tokens_before_summary`	int	`128000`	Trigger summarization when total message tokens exceed this
`min_recent_messages`	int	`6`	Minimum recent messages to always include (never summarized over)
`response_reserve`	int	`4096`	Tokens reserved for the model response

Summarization Triggers

Summarization is triggered when either threshold is reached (whichever comes first):

Message count trigger (N): When the number of messages since the last summary reaches max_messages_before_summary
Token count trigger (K): When the total token count of all messages exceeds max_tokens_before_summary

Both triggers require a minimum number of messages (min_recent_messages + 4) before summarization can occur.

How the Summary Works

When summarization triggers, the system sends older messages to the LLM with a structured extraction prompt. The result is a structured state snapshot with these sections:

## Files Modified
- List each file path and what was changed

## Key Decisions
- List decisions made and their rationale

## Important Values
- Exact file paths, URLs, IDs, error messages, configuration values

## Current State
- What task is in progress
- What was the last action taken
- What is the expected next step

## Pending Tasks
- List remaining work items

This structured format preserves actionable context far better than a prose summary. The snapshot:

Preserves exact file paths, URLs, IDs, error messages, and configuration values verbatim
Retains tool results that contain data (not just ok: true confirmations)
Keeps the snapshot under 600 words
Focuses on state that the agent needs to continue working

Message Building for LLM

When the agent prepares messages for an LLM call, the ContextManager builds the list as follows:

Start with the system prompt
Subtract tool definition tokens from the available budget
If a summary exists and fits within 30% of available tokens, include it
Add recent messages from newest to oldest until the token budget is exhausted (always including at least min_recent_messages)
Run a safety check -- if the total still exceeds the context limit, trigger emergency truncation

Emergency Truncation

If the message list still exceeds the context limit after normal building, the system performs emergency truncation:

Keep the system prompt (always)
Keep the summary message (if present)
Keep the last min_recent_messages
Strip images from multimodal content in middle messages
Truncate large content blocks (keeping first 1000 and last 500 characters)
Drop oldest non-protected messages one by one until within budget

Token Counting

Token counts are calculated using tiktoken (for OpenAI-compatible models) and model-specific context limits. The get_context_limit() function returns the correct context window size for each model.

The token budget is calculated as:

available = context_limit - response_reserve - system_prompt_tokens - tool_definition_tokens

Monitoring Context Status

In Chat: `/context` Command

During an interactive chat session, use the /context command to see current context window usage:

Context Status
  Context limit: 200,000 tokens
  Tool definitions: 12,450 tokens (35 tools)
  Last summary: 42 messages -> 580 tokens
  Created: 2025-01-15 14:30

Summarization Triggers (N messages OR K tokens)
  Messages: 18 / 30 (60%)
           [████████████░░░░░░░░]
  Tokens:   45,000 / 128,000 (35%)
           [███████░░░░░░░░░░░░░]

In Chat: `/tokens` Command

Toggle token usage display after each turn with /tokens. When enabled, you see:

tokens: 45,000 msgs + 12,450 tools | context: 57,450 / 200,000 (28%)

In Chat: `/summarize` Command

Force summarization at any time with /summarize, regardless of whether thresholds have been met.

Summary Persistence

Summaries are saved to disk as JSON alongside session data. When a session is resumed, the existing summary is loaded automatically. This means you can close a chat, come back later, and the agent will still have the compressed context from the earlier conversation.

Tuning Guidelines

Scenario	Recommended Settings
Short conversations (Q&A)	`max_messages_before_summary: 50`, `max_tokens_before_summary: 200000`
Long coding sessions	`max_messages_before_summary: 20`, `min_recent_messages: 8`
Large tool outputs	`response_reserve: 8192`, `max_tokens_before_summary: 80000`
Small context models (8K-32K)	`max_messages_before_summary: 10`, `max_tokens_before_summary: 20000`, `min_recent_messages: 4`
Large context models (200K+)	`max_messages_before_summary: 50`, `max_tokens_before_summary: 150000`

Memory -- Entity-graph memory for cross-session knowledge
Prompt Caching -- Reduce costs when summaries are stable
Configuration -- All configuration layers and options

Context Management

On this page