Overview: The Agent Creation Process
Creating a new agent involves six key steps:- Identify your input source - What webhook/alert will trigger the agent?
- Write the agent description - How will the router know when to use this agent?
- Design the runbook instructions - What should the agent do step-by-step?
- Define required tools - What built-in and custom tools does the agent need?
- Create custom shell tools - Extend Unpage with your specific commands/scripts
- Test and deploy - Validate locally and set up production webhook handling
Example Scenario: Redis Memory Usage Alerts
For this tutorial, we’ll create an agent that handles Redis memory usage alerts from DataDog. When Redis memory usage exceeds 85%, our agent will:- Check current Redis memory statistics
- Identify the largest keys consuming memory
- Analyze recent memory growth patterns
- Check for memory-intensive operations in Redis logs
- Post actionable recommendations to the incident
Step 1: Identify Your Input Source
The first step is understanding what triggers your agent. This could be:- PagerDuty incidents from various monitoring systems
- Direct webhooks from DataDog, New Relic, CloudWatch, etc.
- GitHub Actions failures or other CI/CD events
- Custom application alerts from your own services
Step 2: Write the Agent Description
The agent description is used by Unpage’s Router to automatically select which agent should handle each incoming alert. Write descriptions that are:- Specific about the alert types this agent handles
- Distinguishing to differentiate from other agents
- Comprehensive to cover edge cases and variations
Step 3: Design the Runbook Instructions
Theprompt
section contains step-by-step instructions for what the agent should do.
Think of this as a detailed runbook that a human SRE would follow, but written for an LLM.
Structure your instructions clearly:
- Use numbered or bulleted steps
- Be specific about what information to gather
- Include error handling and edge cases
- Specify what actions to take based on findings
- Include formatting requirements for status updates
Step 4: Define Required Tools
List all the tools your agent needs in thetools
section. These include:
- Built-in tools from Unpage plugins (DataDog, PagerDuty, AWS, etc.)
- Custom shell commands you’ll create for specific operations
- Wildcards for groups of related tools
Step 5: Create Custom Shell Tools
You can always extend Unpage with custom shell commands to interact with your specific infrastructure. These commands can:- Execute Redis CLI commands against your instances
- Run custom scripts or database queries
- Call internal APIs or tools
- Parse and format data for the agent
~/.unpage/profiles/default/config.yaml
) to add the custom commands:
Shell Command Best Practices
When creating shell commands:- Include error handling with
2>/dev/null || echo "Command failed"
- Use environment variables for API keys and credentials
- Chain commands with
&&
for sequential execution - Parse output to provide clean, structured data
- Add timeouts for potentially long-running operations
- Document required permissions and dependencies
Step 6: Test and Deploy
Local Testing
Test your agent with sample data before deploying:Test Routing
Verify the router selects your agent correctly:Production Deployment
Set up webhook handling for production alerts:- Local testing:
http://localhost:8000/webhook
- Ngrok tunnel:
https://your-tunnel.ngrok.io/webhook
- Production:
https://your-domain.com/webhook
Advanced Agent Patterns
Multi-Step Analysis Agents
For complex scenarios, break analysis into phases:Conditional Logic Agents
Use conditional prompts for different scenarios:Integration with External Systems
Agents can interact with any system your shell commands can reach:Debugging and Iteration
Monitoring Agent Performance
Use Unpage’s built-in tracing to monitor agent execution:http://127.0.0.1:5566/#/experiments/1?searchFilter=&orderByKey=attributes.start_time&orderByAsc=false&startTime=ALL&lifecycleFilter=Active&modelVersionFilter=All+Runs&datasetsFilter=W10%3D&compareRunsMode=TRACES
to see:
- Tool usage patterns
- Execution timing
- Error rates and types
- Agent decision flows
Best Practices Summary
- Start simple - Begin with basic analysis, then add complexity
- Test thoroughly - Use various input scenarios and edge cases
- Handle errors gracefully - Include fallbacks for failed commands
- Be specific in descriptions - Help the router make correct decisions
- Document dependencies - Note required tools, permissions, and environment setup
- Iterate based on results - Refine prompts based on real incident responses
- Monitor and improve - Use tracing data to optimize agent performance