Data Pipeline

This example builds an execution mode agent for batch data processing. Execution agents take a single input, process it, and return a structured output -- no interactive conversation. Combined with supyagent batch, they can process hundreds of items from JSONL or CSV files.

The Agent YAML

agents/data-processor.yaml

name: data-processor
description: Processes structured data inputs and returns formatted outputs
version: "1.0"
type: execution

model:
  provider: anthropic/claude-sonnet-4-5-20250929
  temperature: 0.1              # Very low temperature for consistent output
  max_tokens: 2048

system_prompt: |
  You are a data processing agent. You receive structured input,
  process it according to the instructions, and return structured output.

  Rules:
  - Process the input exactly as specified
  - Return only the result, no conversation
  - If the input is malformed, return an error description
  - Be consistent -- same input should produce same output

tools:
  allow: []                     # No tools needed for pure data transformation

will_create_tools: false

limits:
  max_tool_calls_per_turn: 0

Input/Output Formats

Single Task

# String input
supyagent run data-processor "Classify this text: The product arrived damaged"

# JSON input
supyagent run data-processor '{"text": "The product arrived damaged", "categories": ["positive", "negative", "neutral"]}'

# JSON output
supyagent run data-processor '{"text": "Great service!"}' --output json

Output with --output json:

{
  "ok": true,
  "data": "Category: positive\nConfidence: 0.95\nReason: Expression of satisfaction with service"
}

Batch Processing from JSONL

Create an input file with one JSON object per line:

inputs.jsonl

{"text": "The product arrived damaged and customer service was unhelpful", "task": "classify_sentiment"}
{"text": "Excellent quality, fast shipping, would buy again!", "task": "classify_sentiment"}
{"text": "Average product, nothing special but works as described", "task": "classify_sentiment"}
{"text": "Terrible experience. Requesting a full refund.", "task": "classify_sentiment"}

Run the batch:

supyagent batch data-processor inputs.jsonl --output results.jsonl

Output:

-- Task 1/4: {"text": "The product arrived damaged and customer se... --
  done
-- Task 2/4: {"text": "Excellent quality, fast shipping, would bu... --
  done
-- Task 3/4: {"text": "Average product, nothing special but works... --
  done
-- Task 4/4: {"text": "Terrible experience. Requesting a full ref... --
  done

Processed 4 items (4 succeeded, 0 failed)
  Results written to results.jsonl

Batch Processing from CSV

reviews.csv

text,product_id
"Great product, love it!",SKU-001
"Broken on arrival",SKU-002
"Works fine but overpriced",SKU-003

supyagent batch data-processor reviews.csv --format csv --output results.jsonl

Specialized Data Processors

Text Classifier

agents/classifier.yaml

name: classifier
description: Classifies text into predefined categories
version: "1.0"
type: execution

model:
  provider: anthropic/claude-sonnet-4-5-20250929
  temperature: 0.0

system_prompt: |
  You are a text classifier. Given a text input, classify it into
  exactly one of the provided categories.

  Input format: {"text": "...", "categories": ["cat1", "cat2", ...]}

  Output format (return exactly this JSON):
  {"category": "chosen_category", "confidence": 0.0-1.0, "reasoning": "brief explanation"}

  Rules:
  - Choose exactly one category
  - Confidence should reflect how clearly the text fits the category
  - If none fit well, choose the closest match with low confidence
  - Return ONLY the JSON, no other text

tools:
  allow: []

Usage:

supyagent run classifier '{"text": "I need to cancel my subscription", "categories": ["billing", "technical", "product", "general"]}' --output json --quiet

Data Extractor

agents/extractor.yaml

name: extractor
description: Extracts structured data from unstructured text
version: "1.0"
type: execution

model:
  provider: anthropic/claude-sonnet-4-5-20250929
  temperature: 0.0

system_prompt: |
  You are a data extraction agent. Given unstructured text, extract
  the requested fields into structured JSON.

  Input format: {"text": "...", "fields": ["field1", "field2", ...]}

  Return a JSON object with the requested fields. Use null for fields
  that cannot be extracted. Return ONLY the JSON.

tools:
  allow: []

Summarizer

agents/summarizer.yaml

name: summarizer
description: Produces concise summaries of input text
version: "1.0"
type: execution

model:
  provider: anthropic/claude-sonnet-4-5-20250929
  temperature: 0.3
  max_tokens: 1024

system_prompt: |
  Produce a concise summary of the input text. The summary should:
  - Be 2-3 sentences for short inputs, up to a paragraph for long inputs
  - Capture the key points and main conclusion
  - Preserve important numbers, names, and dates
  - Be written in the same language as the input

tools:
  allow: []

Usage:

# Single file
supyagent run summarizer --input article.txt

# Pipe from another command
curl -s https://example.com/article | supyagent run summarizer

# Batch processing
supyagent batch summarizer documents.jsonl --output summaries.jsonl

Pipeline Patterns

Chained Processing

Process data through multiple agents in sequence:

# Step 1: Extract data
supyagent batch extractor raw_data.jsonl --output extracted.jsonl

# Step 2: Classify extracted data
supyagent batch classifier extracted.jsonl --output classified.jsonl

# Step 3: Summarize each category
supyagent batch summarizer classified.jsonl --output summaries.jsonl

Shell Pipeline

cat raw_text.txt | supyagent run extractor --quiet | supyagent run classifier --quiet

With Secrets

Pass API keys for tools that need external access:

supyagent batch api-caller inputs.jsonl \
  --secrets API_KEY=sk-xxx \
  --secrets .env \
  --output results.jsonl

Orchestrated Workflows

For complex multi-step pipelines, use supyagent orchestrate:

workflows/process-reviews.yaml

name: process-reviews
steps:
  - agent: extractor
    task: "Extract product name, rating, and key issues from: {{input}}"
    output: extracted_data

  - agent: classifier
    task: "Classify the sentiment: {{extracted_data}}"
    depends_on: [extracted_data]
    output: classification

  - agent: summarizer
    task: "Summarize findings: {{extracted_data}} with classification {{classification}}"
    depends_on: [extracted_data, classification]

supyagent orchestrate workflows/process-reviews.yaml

Performance Tips

Use temperature: 0.0 for maximum consistency in batch processing
Set max_tokens to the minimum needed for your output format
Use --quiet flag to suppress status messages when piping output
For large batches, monitor progress on stderr while results go to stdout
The --output json flag ensures machine-parseable output

CLI Reference -- Batch command options
CLI Reference -- Run command options
Building Agents -- Agent types and configuration
Custom Tool -- Adding tools for API access in pipelines

Data Pipeline

On this page