Configure LLM providers with LiteLLM — Anthropic, OpenAI, Google, Ollama, and more.

Models

Supyagent uses LiteLLM as a unified interface to 100+ LLM providers. You specify models using the provider/model format in your agent's model.provider field.

Supported Providers

Provider	Format	Required Env Var	Example
Anthropic	`anthropic/{model}`	`ANTHROPIC_API_KEY`	`anthropic/claude-sonnet-4-5-20250929`
OpenAI	`openai/{model}`	`OPENAI_API_KEY`	`openai/gpt-4o`
Google (Gemini)	`google/{model}`	`GOOGLE_API_KEY`	`google/gemini-2.5-flash`
OpenRouter	`openrouter/{provider}/{model}`	`OPENROUTER_API_KEY`	`openrouter/google/gemini-2.5-flash`
Ollama	`ollama/{model}`	none (local)	`ollama/llama3.2`
Azure OpenAI	`azure/{deployment}`	`AZURE_API_KEY`	`azure/gpt-4o-deployment`
AWS Bedrock	`bedrock/{model}`	AWS credentials	`bedrock/anthropic.claude-3-sonnet`
Mistral	`mistral/{model}`	`MISTRAL_API_KEY`	`mistral/mistral-large-latest`
Groq	`groq/{model}`	`GROQ_API_KEY`	`groq/llama-3.1-70b-versatile`
DeepSeek	`deepseek/{model}`	`DEEPSEEK_API_KEY`	`deepseek/deepseek-chat`
Together AI	`together/{model}`	`TOGETHER_API_KEY`	`together/meta-llama/Llama-3-70b-chat-hf`
Cohere	`cohere/{model}`	`COHERE_API_KEY`	`cohere/command-r-plus`

For the full list of supported providers and models, see the LiteLLM provider docs.

Setting API Keys

API keys are stored encrypted using Fernet encryption in ~/.supyagent/config/:

# Set an API key
supyagent config set ANTHROPIC_API_KEY

# List configured keys
supyagent config list

# Delete a key
supyagent config delete OPENAI_API_KEY

Keys are loaded into environment variables at startup, so they are available to LiteLLM without manual export commands.

Basic Configuration

agents/myagent.yaml

model:
  provider: anthropic/claude-sonnet-4-5-20250929
  temperature: 0.7
  max_tokens: 8192

Temperature

Controls randomness in the output. Lower values produce more deterministic responses.

Value	Use Case
`0.0 - 0.3`	Code generation, structured output, precise tasks
`0.4 - 0.7`	General conversation, balanced creativity
`0.8 - 1.5`	Creative writing, brainstorming

Max Tokens

Set max_tokens to control the maximum length of responses. When set to null (the default), the provider's default is used, which is typically the model's maximum output length.

model:
  provider: anthropic/claude-sonnet-4-5-20250929
  max_tokens: null    # Use provider default (model max)
  max_tokens: 4096    # Cap at 4096 tokens
  max_tokens: 16384   # Allow longer responses

Retry and Fallback

Supyagent has built-in retry logic with exponential backoff for transient errors (rate limits, service unavailable). When retries are exhausted, it falls back to alternative models.

agents/resilient.yaml

model:
  provider: anthropic/claude-sonnet-4-5-20250929
  max_retries: 3          # Retry up to 3 times on transient errors
  retry_delay: 1.0        # Start with 1 second delay
  retry_backoff: 2.0      # Double delay each retry (1s, 2s, 4s)
  fallback:
    - openai/gpt-4o
    - google/gemini-2.5-flash

How Failover Works

The primary model is tried with max_retries attempts.
Transient errors (rate limits, service unavailable, connection errors) trigger retries with exponential backoff.
If all retries fail, each fallback model is tried in order with the same retry logic.
Non-transient errors (authentication failure, model not found, budget exceeded, bad request) raise immediately without retries or fallback.

anthropic/claude-sonnet-4-5-20250929  [attempt 1] -> rate limited
                                      [attempt 2] -> rate limited
                                      [attempt 3] -> rate limited
                                      [attempt 4] -> rate limited (exhausted)
openai/gpt-4o                         [attempt 1] -> success

Error Types

Error	Behavior	User Message
`AuthenticationError`	Immediate raise, no fallback	"Check that the API key is set correctly"
`NotFoundError`	Immediate raise, no fallback	"Model not found. Check the model name"
`RateLimitError`	Retry with backoff, then fallback	"Rate limit exceeded. Wait and retry"
`BudgetExceededError`	Immediate raise, no fallback	"API credits exhausted"
`ContextWindowExceededError`	Immediate raise, no fallback	"Context too large for model"
`ServiceUnavailableError`	Retry with backoff, then fallback	"API temporarily unavailable"
`APIConnectionError`	Retry with backoff, then fallback	"Cannot connect to API"

Prompt Caching

When cache: true (the default), supyagent enables provider-specific prompt caching to reduce latency and cost for repeated system prompts and tool definitions.

model:
  provider: anthropic/claude-sonnet-4-5-20250929
  cache: true   # Enable prompt caching (default)

Currently, prompt caching is activated for Anthropic models by sending the anthropic-beta: prompt-caching-2024-07-31 header. Other providers use their native caching mechanisms when available.

Switching Models at Runtime

During an interactive chat session, you can switch models without restarting:

You: /model openai/gpt-4o

This changes the model for the remainder of the session. See Chat Commands for all runtime commands.

Provider-Specific Notes

Anthropic

model:
  provider: anthropic/claude-sonnet-4-5-20250929

Anthropic models support prompt caching, extended thinking (reasoning), and tool use. The cache: true setting is recommended for long conversations with many tools.

OpenAI

model:
  provider: openai/gpt-4o

OpenAI models support function calling and structured outputs natively.

Google Gemini

model:
  provider: google/gemini-2.5-flash

Gemini models have large context windows (up to 1M tokens) and support function calling. You can also access Gemini through OpenRouter for unified billing:

model:
  provider: openrouter/google/gemini-2.5-flash

Ollama (Local)

model:
  provider: ollama/llama3.2

Ollama runs models locally. No API key needed, but the model must be pulled first:

ollama pull llama3.2

Tool use support varies by model. Larger models (70B+) generally handle tool calling better.

OpenRouter

model:
  provider: openrouter/anthropic/claude-sonnet-4-5-20250929

OpenRouter provides access to multiple providers through a single API key. Useful for unified billing or accessing models not directly available.

Choosing a Model

Task	Recommended	Why
General coding assistant	`anthropic/claude-sonnet-4-5-20250929`	Strong tool use, code quality
Fast iteration / drafting	`google/gemini-2.5-flash`	Fast, large context, cost effective
Complex planning	`anthropic/claude-sonnet-4-5-20250929`	Deep reasoning, structured output
Local / private data	`ollama/llama3.2`	No data leaves your machine
Budget-conscious	`openrouter/google/gemini-2.5-flash`	Pay-per-token, good quality

Configuration -- Full YAML schema reference
System Prompts -- Writing effective prompts for different models
Cloud Integrations -- Connecting to third-party services

Models

On this page