Anomaly Alerts vs. Threshold Alerts
While threshold-based alerts trigger when a value exceeds a fixed number (e.g., CPU > 90%), anomaly-based alerts trigger when a value deviates significantly from its expected pattern. This allows you to catch issues that would slip past static thresholds.
| Feature | Threshold Alert | Anomaly Alert |
|---|---|---|
| Trigger | Fixed value (e.g., CPU > 90%) | Pattern deviation (e.g., 2σ above baseline) |
| Adapts to patterns | No | Yes (time-of-day, weekday/weekend) |
| Best for | Known limits, compliance | Catching unusual behavior |
Creating an Anomaly Alert Rule
Navigate to Alerts → Alert Rules → Create and select "Anomaly-Based" as the rule type.
New Anomaly Alert Rule
✓
All Servers
bbb-server-01
bbb-server-02
bbb-server-03
Email
Slack
SMS
Webhook
Minimum time between repeated notifications for the same anomaly type
Rule Preview
When CPU Usage deviates by 1.5σ or more from its baseline on any server, send notifications via Email and Slack. Wait at least 15 minutes before re-alerting for the same condition.
Combining with Playbooks
For automated remediation, you can link anomaly alerts to operational playbooks:
- Create the anomaly alert rule as shown above
- In the "Actions" section, select "Run Playbook"
- Choose a playbook (e.g., "Auto-scale on CPU Anomaly")
- The playbook will execute automatically when the anomaly is detected
Best Practices
- Start conservative: Begin with "High" severity threshold and lower it as you gain confidence
- Use cooldowns: Prevent notification fatigue with appropriate cooldown periods
- Combine channels: Use email for records and Slack/SMS for immediate attention
- Test first: Use the "Test Rule" feature before enabling in production
- Review weekly: Check which rules triggered and fine-tune as needed