Token tracking, automatic summarization, and context compactification with @supyagent/sdk/context.

Context Management

The @supyagent/sdk/context sub-package provides a ContextManager that tracks token usage across LLM calls and automatically summarizes older messages when the context window gets full.

Setup

import { createContextManager } from '@supyagent/sdk/context';
import { anthropic } from '@ai-sdk/anthropic';

const contextManager = createContextManager({
  maxTokens: 128_000,
  summaryModel: anthropic('claude-haiku-4-5-20251001'),
});

Options

Option	Type	Default	Description
`maxTokens`	`number`	`128000`	Model's context window size
`softThreshold`	`number`	`0.75`	Fraction of maxTokens — triggers background summarization
`hardThreshold`	`number`	`0.90`	Fraction of maxTokens — triggers blocking compactification
`responseReserve`	`number`	`4096`	Tokens reserved for the model's response
`minRecentMessages`	`number`	`4`	Messages to always keep (never summarized)
`summaryModel`	AI SDK model	—	Model used for summarization (required unless `compactify` is provided)
`summaryPrompt`	`string`	—	Custom prompt for the summarizer
`compactify`	`function`	—	Custom compactification function (replaces default summarizer)
`estimateTokens`	`function`	~4 chars/token	Custom token estimation function

How It Works

The context manager uses a two-threshold system:

0%                    75%              90%            100%
├─────── Normal ──────┼── Summarize ───┼── Compact ───┤
                     soft            hard

Below soft threshold (75%) — Normal operation, no action needed
Between soft and hard (75-90%) — shouldSummarize() returns true, trigger background summarization
Above hard threshold (90%) — shouldCompactify() returns true, block and compactify before the next LLM call

When compactifying, the manager:

Keeps the last minRecentMessages messages intact
Summarizes all older messages into a single summary message
Inserts the summary as an assistant message with type: "context-summary" metadata

ContextManager Interface

`getState()`

Returns the current context state:

const state = contextManager.getState();
// {
//   totalInputTokens: number,
//   totalOutputTokens: number,
//   estimatedContextSize: number,
//   maxTokens: number,
//   usageRatio: number,        // 0–1
//   softThresholdExceeded: boolean,
//   hardThresholdExceeded: boolean,
//   summaryCount: number,
// }

`recordUsage(usage)`

Record token usage from an LLM step. Call from onStepFinish:

contextManager.recordUsage({
  inputTokens: step.usage.inputTokens,
  outputTokens: step.usage.outputTokens,
});

`shouldSummarize(messages)` / `shouldCompactify(messages)`

Check whether the conversation needs compactification:

if (contextManager.shouldCompactify(messages)) {
  messages = await contextManager.compactify(messages);
}

`compactify(messages)`

Summarize older messages and return a compacted message list:

const compacted = await contextManager.compactify(messages);

`prepareMessages(messages, systemPrompt)`

Prepares messages for the LLM call — finds the last summary message, drops everything before it, and injects the summary into the system prompt:

const { messages: prepared, systemPrompt: updatedPrompt } =
  await contextManager.prepareMessages(messages, systemPrompt);

`getMessageMetadata()`

Returns metadata to attach to streamed responses (cumulative token counts and usage ratio):

return result.toUIMessageStreamResponse({
  messageMetadata: contextManager.getMessageMetadata(),
});

`reset()`

Reset internal state (e.g., when starting a new chat):

contextManager.reset();

Full Integration Example

app/api/chat/route.ts

import { streamText, type UIMessage } from 'ai';
import { createContextManager } from '@supyagent/sdk/context';
import { anthropic } from '@ai-sdk/anthropic';

const contextManager = createContextManager({
  maxTokens: 200_000,
  summaryModel: anthropic('claude-haiku-4-5-20251001'),
});

export async function POST(req: Request) {
  let { messages }: { messages: UIMessage[] } = await req.json();

  // Compactify if context is too large
  if (contextManager.shouldCompactify(messages)) {
    messages = await contextManager.compactify(messages);
  }

  const result = streamText({
    model: anthropic('claude-sonnet-4-6-20250620'),
    system: 'You are a helpful assistant.',
    messages,
    tools,
  });

  return result.toUIMessageStreamResponse({
    onStepFinish: async ({ usage }) => {
      contextManager.recordUsage({
        inputTokens: usage.inputTokens,
        outputTokens: usage.outputTokens,
      });
    },
  });
}

React UI Components

The @supyagent/sdk/react package includes components for displaying context state in the UI:

ContextIndicator — Shows a visual indicator of context usage (percentage bar)
SummaryMessage — Renders summary messages with metadata (messages summarized, token savings)
isContextSummary(message) — Check if a message is a context summary

See React Components for details.

What's Next

React Components — UI components for context indicators and tool rendering
Full API Route Example — See context management in a complete endpoint

Context Management

On this page