Models
Configure LLM providers with LiteLLM — Anthropic, OpenAI, Google, Ollama, and more.
Models
Supyagent uses LiteLLM as a unified interface to 100+ LLM providers. You specify models using the provider/model format in your agent's model.provider field.
Supported Providers
| Provider | Format | Required Env Var | Example |
|---|---|---|---|
| Anthropic | anthropic/{model} | ANTHROPIC_API_KEY | anthropic/claude-sonnet-4-5-20250929 |
| OpenAI | openai/{model} | OPENAI_API_KEY | openai/gpt-4o |
| Google (Gemini) | google/{model} | GOOGLE_API_KEY | google/gemini-2.5-flash |
| OpenRouter | openrouter/{provider}/{model} | OPENROUTER_API_KEY | openrouter/google/gemini-2.5-flash |
| Ollama | ollama/{model} | none (local) | ollama/llama3.2 |
| Azure OpenAI | azure/{deployment} | AZURE_API_KEY | azure/gpt-4o-deployment |
| AWS Bedrock | bedrock/{model} | AWS credentials | bedrock/anthropic.claude-3-sonnet |
| Mistral | mistral/{model} | MISTRAL_API_KEY | mistral/mistral-large-latest |
| Groq | groq/{model} | GROQ_API_KEY | groq/llama-3.1-70b-versatile |
| DeepSeek | deepseek/{model} | DEEPSEEK_API_KEY | deepseek/deepseek-chat |
| Together AI | together/{model} | TOGETHER_API_KEY | together/meta-llama/Llama-3-70b-chat-hf |
| Cohere | cohere/{model} | COHERE_API_KEY | cohere/command-r-plus |
For the full list of supported providers and models, see the LiteLLM provider docs.
Setting API Keys
API keys are stored encrypted using Fernet encryption in ~/.supyagent/config/:
# Set an API key
supyagent config set ANTHROPIC_API_KEY
# List configured keys
supyagent config list
# Delete a key
supyagent config delete OPENAI_API_KEYKeys are loaded into environment variables at startup, so they are available to LiteLLM without manual export commands.
Basic Configuration
model:
provider: anthropic/claude-sonnet-4-5-20250929
temperature: 0.7
max_tokens: 8192Temperature
Controls randomness in the output. Lower values produce more deterministic responses.
| Value | Use Case |
|---|---|
0.0 - 0.3 | Code generation, structured output, precise tasks |
0.4 - 0.7 | General conversation, balanced creativity |
0.8 - 1.5 | Creative writing, brainstorming |
Max Tokens
Set max_tokens to control the maximum length of responses. When set to null (the default), the provider's default is used, which is typically the model's maximum output length.
model:
provider: anthropic/claude-sonnet-4-5-20250929
max_tokens: null # Use provider default (model max)
max_tokens: 4096 # Cap at 4096 tokens
max_tokens: 16384 # Allow longer responsesRetry and Fallback
Supyagent has built-in retry logic with exponential backoff for transient errors (rate limits, service unavailable). When retries are exhausted, it falls back to alternative models.
model:
provider: anthropic/claude-sonnet-4-5-20250929
max_retries: 3 # Retry up to 3 times on transient errors
retry_delay: 1.0 # Start with 1 second delay
retry_backoff: 2.0 # Double delay each retry (1s, 2s, 4s)
fallback:
- openai/gpt-4o
- google/gemini-2.5-flashHow Failover Works
- The primary model is tried with
max_retriesattempts. - Transient errors (rate limits, service unavailable, connection errors) trigger retries with exponential backoff.
- If all retries fail, each fallback model is tried in order with the same retry logic.
- Non-transient errors (authentication failure, model not found, budget exceeded, bad request) raise immediately without retries or fallback.
anthropic/claude-sonnet-4-5-20250929 [attempt 1] -> rate limited
[attempt 2] -> rate limited
[attempt 3] -> rate limited
[attempt 4] -> rate limited (exhausted)
openai/gpt-4o [attempt 1] -> successError Types
| Error | Behavior | User Message |
|---|---|---|
AuthenticationError | Immediate raise, no fallback | "Check that the API key is set correctly" |
NotFoundError | Immediate raise, no fallback | "Model not found. Check the model name" |
RateLimitError | Retry with backoff, then fallback | "Rate limit exceeded. Wait and retry" |
BudgetExceededError | Immediate raise, no fallback | "API credits exhausted" |
ContextWindowExceededError | Immediate raise, no fallback | "Context too large for model" |
ServiceUnavailableError | Retry with backoff, then fallback | "API temporarily unavailable" |
APIConnectionError | Retry with backoff, then fallback | "Cannot connect to API" |
Prompt Caching
When cache: true (the default), supyagent enables provider-specific prompt caching to reduce latency and cost for repeated system prompts and tool definitions.
model:
provider: anthropic/claude-sonnet-4-5-20250929
cache: true # Enable prompt caching (default)Currently, prompt caching is activated for Anthropic models by sending the anthropic-beta: prompt-caching-2024-07-31 header. Other providers use their native caching mechanisms when available.
Switching Models at Runtime
During an interactive chat session, you can switch models without restarting:
You: /model openai/gpt-4oThis changes the model for the remainder of the session. See Chat Commands for all runtime commands.
Provider-Specific Notes
Anthropic
model:
provider: anthropic/claude-sonnet-4-5-20250929Anthropic models support prompt caching, extended thinking (reasoning), and tool use. The cache: true setting is recommended for long conversations with many tools.
OpenAI
model:
provider: openai/gpt-4oOpenAI models support function calling and structured outputs natively.
Google Gemini
model:
provider: google/gemini-2.5-flashGemini models have large context windows (up to 1M tokens) and support function calling. You can also access Gemini through OpenRouter for unified billing:
model:
provider: openrouter/google/gemini-2.5-flashOllama (Local)
model:
provider: ollama/llama3.2Ollama runs models locally. No API key needed, but the model must be pulled first:
ollama pull llama3.2Tool use support varies by model. Larger models (70B+) generally handle tool calling better.
OpenRouter
model:
provider: openrouter/anthropic/claude-sonnet-4-5-20250929OpenRouter provides access to multiple providers through a single API key. Useful for unified billing or accessing models not directly available.
Choosing a Model
| Task | Recommended | Why |
|---|---|---|
| General coding assistant | anthropic/claude-sonnet-4-5-20250929 | Strong tool use, code quality |
| Fast iteration / drafting | google/gemini-2.5-flash | Fast, large context, cost effective |
| Complex planning | anthropic/claude-sonnet-4-5-20250929 | Deep reasoning, structured output |
| Local / private data | ollama/llama3.2 | No data leaves your machine |
| Budget-conscious | openrouter/google/gemini-2.5-flash | Pay-per-token, good quality |
Related
- Configuration -- Full YAML schema reference
- System Prompts -- Writing effective prompts for different models
- Cloud Integrations -- Connecting to third-party services