Overview
The System Health page provides a comprehensive view of your Scalelite cluster's operational status. It monitors services, tracks errors, and calculates an overall health score to help you quickly assess system stability.
Health Score
The health score is a percentage (0-100%) that reflects your cluster's overall operational health.
Production Cluster
Last updated: 2 minutes ago
Health Score Calculation
The health score is calculated based on multiple factors:
- Server Availability (40%): Percentage of servers online and responsive
- Error Rate (25%): Number of errors in the last hour
- Response Time (20%): Average API response latency
- Resource Utilization (15%): CPU, memory, and disk usage levels
Health Score Thresholds
90-100%: Healthy - All systems operating normally
70-89%: Warning - Some issues detected, monitoring recommended
Below 70%: Critical - Immediate attention required
Service Status
Monitor individual services that make up your Scalelite infrastructure:
Service Status Indicators
| ● Online | Service is running normally |
| ● Degraded | Service running with reduced performance or queued work |
| ● Offline | Service is not responding or has failed |
Error Tracking
The system health page also tracks recent errors:
- Errors Last Hour: Count of errors in the past 60 minutes
- Error Trend: Whether errors are increasing or decreasing
- Most Common Error: The most frequently occurring error type
Best Practices
- Check the System Health page daily or set up alerts for score drops
- Investigate any service showing "Degraded" status
- Monitor the error count trend - a sudden spike often indicates a problem
- Use the 24-hour uptime metric to track historical reliability
Related: Setting Up Custom Alert Rules