Overview
The System Health page provides a comprehensive view of your Scalelite cluster's operational status. It monitors services, tracks errors, and calculates an overall health score to help you quickly assess system stability.
Health Score
The health score is a percentage (0-100%) that reflects your cluster's overall operational health.
Production Cluster
Last updated: 2 minutes ago
Hybrid Health Score Calculation
The health score uses a hybrid formula combining infrastructure health (70%) and monitoring system health (30%):
Infrastructure Score (70% of total)
- Server Availability (40%): Percentage of BBB servers online and responsive
- Error Rate (25%): Number of errors in the last hour (0 errors = 100%, <5 = 80%, <10 = 50%)
- Response Time (20%): Average API response time from active health probes (<500ms = 100%, <1000ms = 80%, <3000ms = 50%)
- Resource Utilization (15%): Peak CPU, memory, and disk usage levels (<70% = 100%, <85% = 70%, <95% = 40%)
Monitoring System Score (30% of total)
- Data Ingestion (40%): How recently data was received from webhook reports
- Alert System (30%): Alert processing and resolution rate
- Notification System (30%): Notification delivery success rate
Health Score Thresholds
90-100%: Healthy - All systems operating normally
70-89%: Warning - Some issues detected, monitoring recommended
Below 70%: Critical - Immediate attention required
Active Health Probing
Scalelite Manager Pro performs active health probes every minute to monitor your infrastructure:
- Scalelite API Probing: Measures response time and availability of your Scalelite endpoint
- BBB Server Probing: Checks response times, API status, and port accessibility (443, 80) for each server
- Response Time Tracking: Historical response time data for trending and alerting
Container Resource Usage
For Docker-based Scalelite deployments, the system monitors container metrics:
| Container | CPU % | Memory % | Restarts | Uptime | Errors |
|---|---|---|---|---|---|
| scalelite-api | 12% | 45% | 0 | 2d 5h | 0 |
| scalelite-poller | 5% | 28% | 0 | 2d 5h | 0 |
| redis | 2% | 15% | 0 | 2d 5h | 0 |
Resource usage is color-coded: Green (<70%), Orange (70-85%), Red (>85%)
BBB Server Health
Monitor the health of individual BigBlueButton servers in your pool:
- Response Time: API response latency from active probes (updated every minute)
- API Status: OK, Error, Timeout, or Unreachable
- Port Accessibility: Checks if ports 443 (HTTPS) and 80 (HTTP) are accessible
- Last Probe: When the server was last checked
Service Status
Monitor individual services that make up your Scalelite infrastructure:
Service Status Indicators
| ● Online | Service is running normally |
| ● Degraded | Service running with reduced performance or queued work |
| ● Offline | Service is not responding or has failed |
Error Tracking
The system health page also tracks recent errors:
- Errors Last Hour: Count of errors in the past 60 minutes
- Error Trend: Whether errors are increasing or decreasing
- Most Common Error: The most frequently occurring error type
Best Practices
- Check the System Health page daily or set up alerts for score drops
- Investigate any service showing "Degraded" status
- Monitor the error count trend - a sudden spike often indicates a problem
- Use the 24-hour uptime metric to track historical reliability
Related: Setting Up Custom Alert Rules