Supyagent
Building Agents

Models

Configure LLM providers with LiteLLM — Anthropic, OpenAI, Google, Ollama, and more.

Models

Supyagent uses LiteLLM as a unified interface to 100+ LLM providers. You specify models using the provider/model format in your agent's model.provider field.

Supported Providers

ProviderFormatRequired Env VarExample
Anthropicanthropic/{model}ANTHROPIC_API_KEYanthropic/claude-sonnet-4-5-20250929
OpenAIopenai/{model}OPENAI_API_KEYopenai/gpt-4o
Google (Gemini)google/{model}GOOGLE_API_KEYgoogle/gemini-2.5-flash
OpenRouteropenrouter/{provider}/{model}OPENROUTER_API_KEYopenrouter/google/gemini-2.5-flash
Ollamaollama/{model}none (local)ollama/llama3.2
Azure OpenAIazure/{deployment}AZURE_API_KEYazure/gpt-4o-deployment
AWS Bedrockbedrock/{model}AWS credentialsbedrock/anthropic.claude-3-sonnet
Mistralmistral/{model}MISTRAL_API_KEYmistral/mistral-large-latest
Groqgroq/{model}GROQ_API_KEYgroq/llama-3.1-70b-versatile
DeepSeekdeepseek/{model}DEEPSEEK_API_KEYdeepseek/deepseek-chat
Together AItogether/{model}TOGETHER_API_KEYtogether/meta-llama/Llama-3-70b-chat-hf
Coherecohere/{model}COHERE_API_KEYcohere/command-r-plus

For the full list of supported providers and models, see the LiteLLM provider docs.

Setting API Keys

API keys are stored encrypted using Fernet encryption in ~/.supyagent/config/:

# Set an API key
supyagent config set ANTHROPIC_API_KEY

# List configured keys
supyagent config list

# Delete a key
supyagent config delete OPENAI_API_KEY

Keys are loaded into environment variables at startup, so they are available to LiteLLM without manual export commands.

Basic Configuration

agents/myagent.yaml
model:
  provider: anthropic/claude-sonnet-4-5-20250929
  temperature: 0.7
  max_tokens: 8192

Temperature

Controls randomness in the output. Lower values produce more deterministic responses.

ValueUse Case
0.0 - 0.3Code generation, structured output, precise tasks
0.4 - 0.7General conversation, balanced creativity
0.8 - 1.5Creative writing, brainstorming

Max Tokens

Set max_tokens to control the maximum length of responses. When set to null (the default), the provider's default is used, which is typically the model's maximum output length.

model:
  provider: anthropic/claude-sonnet-4-5-20250929
  max_tokens: null    # Use provider default (model max)
  max_tokens: 4096    # Cap at 4096 tokens
  max_tokens: 16384   # Allow longer responses

Retry and Fallback

Supyagent has built-in retry logic with exponential backoff for transient errors (rate limits, service unavailable). When retries are exhausted, it falls back to alternative models.

agents/resilient.yaml
model:
  provider: anthropic/claude-sonnet-4-5-20250929
  max_retries: 3          # Retry up to 3 times on transient errors
  retry_delay: 1.0        # Start with 1 second delay
  retry_backoff: 2.0      # Double delay each retry (1s, 2s, 4s)
  fallback:
    - openai/gpt-4o
    - google/gemini-2.5-flash

How Failover Works

  1. The primary model is tried with max_retries attempts.
  2. Transient errors (rate limits, service unavailable, connection errors) trigger retries with exponential backoff.
  3. If all retries fail, each fallback model is tried in order with the same retry logic.
  4. Non-transient errors (authentication failure, model not found, budget exceeded, bad request) raise immediately without retries or fallback.
anthropic/claude-sonnet-4-5-20250929  [attempt 1] -> rate limited
                                      [attempt 2] -> rate limited
                                      [attempt 3] -> rate limited
                                      [attempt 4] -> rate limited (exhausted)
openai/gpt-4o                         [attempt 1] -> success

Error Types

ErrorBehaviorUser Message
AuthenticationErrorImmediate raise, no fallback"Check that the API key is set correctly"
NotFoundErrorImmediate raise, no fallback"Model not found. Check the model name"
RateLimitErrorRetry with backoff, then fallback"Rate limit exceeded. Wait and retry"
BudgetExceededErrorImmediate raise, no fallback"API credits exhausted"
ContextWindowExceededErrorImmediate raise, no fallback"Context too large for model"
ServiceUnavailableErrorRetry with backoff, then fallback"API temporarily unavailable"
APIConnectionErrorRetry with backoff, then fallback"Cannot connect to API"

Prompt Caching

When cache: true (the default), supyagent enables provider-specific prompt caching to reduce latency and cost for repeated system prompts and tool definitions.

model:
  provider: anthropic/claude-sonnet-4-5-20250929
  cache: true   # Enable prompt caching (default)

Currently, prompt caching is activated for Anthropic models by sending the anthropic-beta: prompt-caching-2024-07-31 header. Other providers use their native caching mechanisms when available.

Switching Models at Runtime

During an interactive chat session, you can switch models without restarting:

You: /model openai/gpt-4o

This changes the model for the remainder of the session. See Chat Commands for all runtime commands.

Provider-Specific Notes

Anthropic

model:
  provider: anthropic/claude-sonnet-4-5-20250929

Anthropic models support prompt caching, extended thinking (reasoning), and tool use. The cache: true setting is recommended for long conversations with many tools.

OpenAI

model:
  provider: openai/gpt-4o

OpenAI models support function calling and structured outputs natively.

Google Gemini

model:
  provider: google/gemini-2.5-flash

Gemini models have large context windows (up to 1M tokens) and support function calling. You can also access Gemini through OpenRouter for unified billing:

model:
  provider: openrouter/google/gemini-2.5-flash

Ollama (Local)

model:
  provider: ollama/llama3.2

Ollama runs models locally. No API key needed, but the model must be pulled first:

ollama pull llama3.2

Tool use support varies by model. Larger models (70B+) generally handle tool calling better.

OpenRouter

model:
  provider: openrouter/anthropic/claude-sonnet-4-5-20250929

OpenRouter provides access to multiple providers through a single API key. Useful for unified billing or accessing models not directly available.

Choosing a Model

TaskRecommendedWhy
General coding assistantanthropic/claude-sonnet-4-5-20250929Strong tool use, code quality
Fast iteration / draftinggoogle/gemini-2.5-flashFast, large context, cost effective
Complex planninganthropic/claude-sonnet-4-5-20250929Deep reasoning, structured output
Local / private dataollama/llama3.2No data leaves your machine
Budget-consciousopenrouter/google/gemini-2.5-flashPay-per-token, good quality