Agents

Unpage agents are the core building blocks for automating infrastructure operations with LLMs. Unlike traditional automation that requires extensive coding, Unpage agents are defined with simple YAML configurations that specify what they analyze and how they respond.

What are Agents?

In Unpage, an agent is a specialized LLM that is:

Purpose-built for a specific type of infrastructure task or alert
Context-aware with access to your knowledge graph
Tool-enabled with specific permissions to interact with your infrastructure
Configuration-driven rather than requiring custom code development

Agents act as first responders to infrastructure events, analyzing situations and taking appropriate actions based on your predefined instructions.

Agent Configuration

Agents are defined using YAML configuration files with three key components:

1. Description

The description specifies when the agent should be used. This is particularly important for the routing system, which automatically selects the most appropriate agent for each incoming alert.

description: >
  Use this agent to analyze alerts that meet the following criteria:
    - CPU usage exceeds 90% on production EC2 instances
    - The alert has been active for more than 15 minutes
    - The instance is part of the main application cluster

2. Prompt

The prompt contains detailed instructions for the agent, defining:

What information to gather
How to analyze the situation
What actions to take
Any constraints or guidelines to follow

prompt: >
  You are responsible for analyzing high CPU usage alerts on production EC2 instances.

  Follow these steps:
  1. Verify that CPU usage is indeed exceeding 90% using current metrics
  2. Check if the instance is experiencing unusual load by examining:
     - Recent log entries for errors or warnings
     - Unusual network traffic patterns
     - Scheduled jobs that might be running
  3. Determine if this is an isolated incident or affecting multiple instances
  4. If the issue appears to be temporary and non-critical:
     - Post a status update indicating it's likely transient
  5. If the issue appears serious:
     - Post a detailed analysis with recommended next steps
     - Highlight any potential service impacts

  Always include specific metrics and timestamps in your analysis.
  Never make assumptions without data to support them.

3. Tools

The tools section explicitly grants the agent permission to use specific infrastructure tools. This follows the principle of least privilege, ensuring agents only have access to the tools they need.

tools:
  - "core_current_datetime"
  - "metrics_get_metrics_for_node"
  - "metrics_list_available_metrics_for_node"
  - "graph_get_resource_details"
  - "graph_get_neighboring_resources"
  - "solarwinds_search_logs"
  - "pagerduty_post_status_update"

You can use wildcards and regular expressions to specify tools:

tools:
  # Allow all metrics tools
  - "metrics_*"
  # Allow all tools except those that modify resources
  - "/^(?!.*_modify_|.*_delete_|.*_create_).*$/"

Agent File Location

Agent configuration files are stored in your Unpage profile directory:

~/.unpage/profiles/<profile_name>/agents/

Each agent has its own YAML file, named after the agent (e.g., cpu-alert-agent.yaml). You can have multiple agents, each specialized for different tasks.

Example Agent Configuration

Here’s an example of a complete agent configuration:

# cpu-alert-agent.yaml

# Used by the router to determine which agent to use for an alert
description: >
  Use this agent to analyze alerts that meet the following criteria:
    - The alert is related to CPU usage exceeding thresholds
    - The alert comes from AWS CloudWatch or Datadog
    - The affected resource is a compute instance (EC2, container, etc.)

# Instructions for the agent
prompt: >
  You are an agent specialized in analyzing high CPU usage alerts.

  When investigating a CPU alert, follow these steps:

  1. Check the current CPU metrics to verify the alert is still active
  2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage
  3. Check logs from around the time the alert started for any errors or unusual activity
  4. Look for any recent deployments or changes that might explain the high usage
  5. Check if similar resources are experiencing the same issue

  Based on your findings, update the incident with:
  - Current status of the issue
  - Likely cause based on available evidence
  - Recommended next steps
  - Whether this appears to be a critical issue requiring immediate human attention

  Be concise but thorough. Include specific metrics, timestamps, and log entries
  that support your analysis.

  NEVER make up information or assume values you haven't verified.

# Tools the agent can use
tools:
  - "core_current_datetime"
  - "core_convert_to_timezone"
  - "metrics_get_metrics_for_node"
  - "metrics_list_available_metrics_for_node"
  - "graph_get_resource_details"
  - "graph_get_neighboring_resources"
  - "graph_get_resource_topology"
  - "solarwinds_search_logs"
  - "pagerduty_post_status_update"
  - "pagerduty_get_incident_details"
  - "aws_describe_ec2_instance"

Creating and Managing Agents

Unpage provides several commands to work with agents:

Creating a New Agent

unpage agent create my-new-agent

This creates a new agent configuration file from a template and opens it in your default editor.

Editing an Existing Agent

unpage agent edit cpu-alert-agent

Listing Available Agents

unpage agent list

Running an Agent Manually

# Run with a JSON payload from a file
unpage agent run cpu-alert-agent @path/to/alert.json

# Run with a direct JSON payload
unpage agent run cpu-alert-agent '{"alert": "CPU usage exceeds 90%"}'

Common Agent Use Cases

Agents can be configured for various infrastructure tasks:

Incident Response

Automatically analyze and respond to alerts from monitoring systems:

Triage alerts based on severity and impact
Gather relevant logs and metrics
Post status updates with analysis
Suggest remediation steps

Troubleshooting

Assist with diagnosing complex infrastructure issues:

Analyze performance bottlenecks
Correlate events across multiple systems
Identify potential root causes
Suggest debugging approaches

Automation

Handle routine operational tasks:

Respond to common alerts with well-defined playbooks
Gather context for human responders
Document incident timelines
Check system health after changes

Best Practices for Agent Design

When designing agents, follow these best practices:

Specialize your agents - Create purpose-specific agents rather than one general-purpose agent
Clear descriptions - Write detailed descriptions to help the router select the right agent
Structured prompts - Organize prompts with clear steps and expectations
Principle of least privilege - Only grant access to tools the agent actually needs
Include guardrails - Add explicit constraints in prompts about what not to do
Test thoroughly - Run your agents against sample payloads before deploying
Iterative refinement - Review agent responses and refine prompts based on performance

Conclusion

Unpage agents provide a configuration-driven approach to infrastructure automation with LLMs. By defining specialized agents with clear instructions and appropriate tool access, you can create a responsive system that handles routine operations and provides valuable insight for complex incidents. To learn more about how alerts are routed to the appropriate agent, see the Router documentation.

About Unpage

Core Concepts

Example Agents

Plugins

How-To Guides

Command Reference

What are Agents?

Agent Configuration

1. Description

2. Prompt

3. Tools

Agent File Location

Example Agent Configuration

Creating and Managing Agents

Creating a New Agent

Editing an Existing Agent

Listing Available Agents

Running an Agent Manually

Common Agent Use Cases

Incident Response

Troubleshooting

Automation

Best Practices for Agent Design

Conclusion

About Unpage

Core Concepts

Example Agents

Plugins

How-To Guides

Command Reference

​What are Agents?

​Agent Configuration

​1. Description

​2. Prompt

​3. Tools

​Agent File Location

​Example Agent Configuration

​Creating and Managing Agents

​Creating a New Agent

​Editing an Existing Agent

​Listing Available Agents

​Running an Agent Manually

​Common Agent Use Cases

​Incident Response

​Troubleshooting

​Automation

​Best Practices for Agent Design

​Conclusion

What are Agents?

Agent Configuration

1. Description

2. Prompt

3. Tools

Agent File Location

Example Agent Configuration

Creating and Managing Agents

Creating a New Agent

Editing an Existing Agent

Listing Available Agents

Running an Agent Manually

Common Agent Use Cases

Incident Response

Troubleshooting

Automation

Best Practices for Agent Design

Conclusion