Examples & Recipes
Data Pipeline
Build an execution mode agent for batch data processing using supyagent batch.
Data Pipeline
This example builds an execution mode agent for batch data processing. Execution agents take a single input, process it, and return a structured output -- no interactive conversation. Combined with supyagent batch, they can process hundreds of items from JSONL or CSV files.
The Agent YAML
name: data-processor
description: Processes structured data inputs and returns formatted outputs
version: "1.0"
type: execution
model:
provider: anthropic/claude-sonnet-4-5-20250929
temperature: 0.1 # Very low temperature for consistent output
max_tokens: 2048
system_prompt: |
You are a data processing agent. You receive structured input,
process it according to the instructions, and return structured output.
Rules:
- Process the input exactly as specified
- Return only the result, no conversation
- If the input is malformed, return an error description
- Be consistent -- same input should produce same output
tools:
allow: [] # No tools needed for pure data transformation
will_create_tools: false
limits:
max_tool_calls_per_turn: 0Input/Output Formats
Single Task
# String input
supyagent run data-processor "Classify this text: The product arrived damaged"
# JSON input
supyagent run data-processor '{"text": "The product arrived damaged", "categories": ["positive", "negative", "neutral"]}'
# JSON output
supyagent run data-processor '{"text": "Great service!"}' --output jsonOutput with --output json:
{
"ok": true,
"data": "Category: positive\nConfidence: 0.95\nReason: Expression of satisfaction with service"
}Batch Processing from JSONL
Create an input file with one JSON object per line:
{"text": "The product arrived damaged and customer service was unhelpful", "task": "classify_sentiment"}
{"text": "Excellent quality, fast shipping, would buy again!", "task": "classify_sentiment"}
{"text": "Average product, nothing special but works as described", "task": "classify_sentiment"}
{"text": "Terrible experience. Requesting a full refund.", "task": "classify_sentiment"}Run the batch:
supyagent batch data-processor inputs.jsonl --output results.jsonlOutput:
-- Task 1/4: {"text": "The product arrived damaged and customer se... --
done
-- Task 2/4: {"text": "Excellent quality, fast shipping, would bu... --
done
-- Task 3/4: {"text": "Average product, nothing special but works... --
done
-- Task 4/4: {"text": "Terrible experience. Requesting a full ref... --
done
Processed 4 items (4 succeeded, 0 failed)
Results written to results.jsonlBatch Processing from CSV
text,product_id
"Great product, love it!",SKU-001
"Broken on arrival",SKU-002
"Works fine but overpriced",SKU-003supyagent batch data-processor reviews.csv --format csv --output results.jsonlSpecialized Data Processors
Text Classifier
name: classifier
description: Classifies text into predefined categories
version: "1.0"
type: execution
model:
provider: anthropic/claude-sonnet-4-5-20250929
temperature: 0.0
system_prompt: |
You are a text classifier. Given a text input, classify it into
exactly one of the provided categories.
Input format: {"text": "...", "categories": ["cat1", "cat2", ...]}
Output format (return exactly this JSON):
{"category": "chosen_category", "confidence": 0.0-1.0, "reasoning": "brief explanation"}
Rules:
- Choose exactly one category
- Confidence should reflect how clearly the text fits the category
- If none fit well, choose the closest match with low confidence
- Return ONLY the JSON, no other text
tools:
allow: []Usage:
supyagent run classifier '{"text": "I need to cancel my subscription", "categories": ["billing", "technical", "product", "general"]}' --output json --quietData Extractor
name: extractor
description: Extracts structured data from unstructured text
version: "1.0"
type: execution
model:
provider: anthropic/claude-sonnet-4-5-20250929
temperature: 0.0
system_prompt: |
You are a data extraction agent. Given unstructured text, extract
the requested fields into structured JSON.
Input format: {"text": "...", "fields": ["field1", "field2", ...]}
Return a JSON object with the requested fields. Use null for fields
that cannot be extracted. Return ONLY the JSON.
tools:
allow: []Summarizer
name: summarizer
description: Produces concise summaries of input text
version: "1.0"
type: execution
model:
provider: anthropic/claude-sonnet-4-5-20250929
temperature: 0.3
max_tokens: 1024
system_prompt: |
Produce a concise summary of the input text. The summary should:
- Be 2-3 sentences for short inputs, up to a paragraph for long inputs
- Capture the key points and main conclusion
- Preserve important numbers, names, and dates
- Be written in the same language as the input
tools:
allow: []Usage:
# Single file
supyagent run summarizer --input article.txt
# Pipe from another command
curl -s https://example.com/article | supyagent run summarizer
# Batch processing
supyagent batch summarizer documents.jsonl --output summaries.jsonlPipeline Patterns
Chained Processing
Process data through multiple agents in sequence:
# Step 1: Extract data
supyagent batch extractor raw_data.jsonl --output extracted.jsonl
# Step 2: Classify extracted data
supyagent batch classifier extracted.jsonl --output classified.jsonl
# Step 3: Summarize each category
supyagent batch summarizer classified.jsonl --output summaries.jsonlShell Pipeline
cat raw_text.txt | supyagent run extractor --quiet | supyagent run classifier --quietWith Secrets
Pass API keys for tools that need external access:
supyagent batch api-caller inputs.jsonl \
--secrets API_KEY=sk-xxx \
--secrets .env \
--output results.jsonlOrchestrated Workflows
For complex multi-step pipelines, use supyagent orchestrate:
name: process-reviews
steps:
- agent: extractor
task: "Extract product name, rating, and key issues from: {{input}}"
output: extracted_data
- agent: classifier
task: "Classify the sentiment: {{extracted_data}}"
depends_on: [extracted_data]
output: classification
- agent: summarizer
task: "Summarize findings: {{extracted_data}} with classification {{classification}}"
depends_on: [extracted_data, classification]supyagent orchestrate workflows/process-reviews.yamlPerformance Tips
- Use
temperature: 0.0for maximum consistency in batch processing - Set
max_tokensto the minimum needed for your output format - Use
--quietflag to suppress status messages when piping output - For large batches, monitor progress on stderr while results go to stdout
- The
--output jsonflag ensures machine-parseable output
Related
- CLI Reference -- Batch command options
- CLI Reference -- Run command options
- Building Agents -- Agent types and configuration
- Custom Tool -- Adding tools for API access in pipelines