Overview
Active alerts require attention. This guide covers the workflow for triaging, acknowledging, and resolving alerts to maintain cluster health.
Alert Workflow
New Alert
→
Acknowledge
→
Investigate
→
Resolve
Active Alerts List
High CPU Usage
New
CPU usage exceeded 95% on bbb-prod-01
Low Disk Space
Acknowledged
Disk space at 82% on bbb-prod-02 • Acknowledged by John S.
Alert Actions
Acknowledge
Acknowledging an alert indicates someone is working on it:
- Prevents duplicate work by other team members
- Stops escalation timers (if configured)
- Records who acknowledged and when
Resolve
Resolving marks the alert as addressed:
- Add resolution notes (recommended)
- Alert moves to history
- Metrics update to reflect resolution
Ignore/Snooze
For non-critical alerts that can wait:
- Ignore: Dismiss without action (use sparingly)
- Snooze: Temporarily hide for a specified duration
Best Practices
- Acknowledge alerts promptly to prevent duplicate investigation
- Add detailed resolution notes for future reference
- Review alert history to identify recurring issues
- Adjust thresholds if alerts are too frequent or too rare
Related: Alert History & Reporting