Overview
Auto-remediation enables your Scalelite cluster to automatically detect and resolve issues without manual intervention. When configured policies detect problems like high disk usage or stale meetings, the system automatically executes corrective actions.
- 24/7 automated monitoring and response
- Reduced downtime and manual intervention
- Consistent, predictable issue resolution
- Full audit trail of all automated actions
Dashboard Overview
The Auto-Remediation dashboard provides real-time visibility into your policies and their execution history.
🔧 Automated Remediation
| Policy | Type | Trigger | Last Run | Success | Status | Actions |
|---|---|---|---|---|---|---|
| Disk Cleanup High | Disk Cleanup | disk_usage > 85% | Dec 4, 09:15 | 23 / 1 | Active | |
| Ghost Meeting Cleanup | Meeting Cleanup | meeting_age > 4h | Dec 4, 08:30 | 8 / 0 | Active |
Understanding Policy Types
1Disk Cleanup
Automatically removes old recordings and temporary files when disk usage exceeds your threshold.
- Trigger Metric: disk_usage_percent
- Common Threshold: 80-90%
- Actions: Delete oldest unpublished recordings, clear tmp files
2Meeting Cleanup
Terminates ghost meetings that have been running longer than expected without active participants.
- Trigger Metric: meeting_duration_hours
- Common Threshold: 4-8 hours duration
- Actions: End meeting via BBB API
3Service Restart
Restarts BigBlueButton services when health metrics indicate degraded performance.
- Trigger Metric: health_score
- Common Threshold: Health score < 50
- Actions: Graceful service restart
4Cache Clear
Clears application and system caches when memory pressure is detected.
- Trigger Metric: memory_usage_percent
- Common Threshold: 85-95%
- Actions: Clear Redis cache, system page cache
Creating a Policy
- Navigate to Automation → Remediation from the sidebar
- Click New Policy button
- Configure:
- Name: Descriptive name (e.g., "Disk Cleanup 85%")
- Type: Select policy type
- Trigger Metric: Choose the metric to monitor
- Threshold: Set the trigger value
- Cooldown: Minimum time between executions (default: 30 min)
- Click Save then Enable
Best Practices
- Start Conservative: Begin with higher thresholds (90%) and lower as you gain confidence
- Set Cooldowns: Use 30-60 minute cooldowns to prevent rapid re-execution
- Monitor Success Rate: Investigate policies with success rates below 80%
- Test First: Use manual Execute button to test before enabling