Skip to Content
Back to Dashboard & Monitoring
Dashboard & Monitoring Dashboard Monitoring

System Health Overview

Monitor your cluster's overall health status

Overview

The System Health page provides a comprehensive view of your Scalelite cluster's operational status. It monitors services, tracks errors, and calculates an overall health score to help you quickly assess system stability.

Health Score

The health score is a percentage (0-100%) that reflects your cluster's overall operational health.

90%
Health

Production Cluster

Last updated: 2 minutes ago

All Systems Operational
5/5
Servers Online
0
Errors (1h)
99.8%
Uptime (24h)
45ms
Avg Response

Hybrid Health Score Calculation

The health score uses a hybrid formula combining infrastructure health (70%) and monitoring system health (30%):

Infrastructure Score (70% of total)

  • Server Availability (40%): Percentage of BBB servers online and responsive
  • Error Rate (25%): Number of errors in the last hour (0 errors = 100%, <5 = 80%, <10 = 50%)
  • Response Time (20%): Average API response time from active health probes (<500ms = 100%, <1000ms = 80%, <3000ms = 50%)
  • Resource Utilization (15%): Peak CPU, memory, and disk usage levels (<70% = 100%, <85% = 70%, <95% = 40%)

Monitoring System Score (30% of total)

  • Data Ingestion (40%): How recently data was received from webhook reports
  • Alert System (30%): Alert processing and resolution rate
  • Notification System (30%): Notification delivery success rate

Health Score Thresholds

90-100%: Healthy - All systems operating normally
70-89%: Warning - Some issues detected, monitoring recommended
Below 70%: Critical - Immediate attention required

Active Health Probing

Scalelite Manager Pro performs active health probes every minute to monitor your infrastructure:

  • Scalelite API Probing: Measures response time and availability of your Scalelite endpoint
  • BBB Server Probing: Checks response times, API status, and port accessibility (443, 80) for each server
  • Response Time Tracking: Historical response time data for trending and alerting

Container Resource Usage

For Docker-based Scalelite deployments, the system monitors container metrics:

Container CPU % Memory % Restarts Uptime Errors
scalelite-api 12% 45% 0 2d 5h 0
scalelite-poller 5% 28% 0 2d 5h 0
redis 2% 15% 0 2d 5h 0

Resource usage is color-coded: Green (<70%), Orange (70-85%), Red (>85%)

BBB Server Health

Monitor the health of individual BigBlueButton servers in your pool:

  • Response Time: API response latency from active probes (updated every minute)
  • API Status: OK, Error, Timeout, or Unreachable
  • Port Accessibility: Checks if ports 443 (HTTPS) and 80 (HTTP) are accessible
  • Last Probe: When the server was last checked

Service Status

Monitor individual services that make up your Scalelite infrastructure:

Scalelite API
Response: 32ms
Redis Cache
Memory: 45%
PostgreSQL
Connections: 12/100
Nginx Proxy
Requests: 1.2k/min
Recording Import
Queue: 5 pending
Poller Service
Last poll: 30s ago

Service Status Indicators

● Online Service is running normally
● Degraded Service running with reduced performance or queued work
● Offline Service is not responding or has failed

Error Tracking

The system health page also tracks recent errors:

  • Errors Last Hour: Count of errors in the past 60 minutes
  • Error Trend: Whether errors are increasing or decreasing
  • Most Common Error: The most frequently occurring error type

Best Practices

  • Check the System Health page daily or set up alerts for score drops
  • Investigate any service showing "Degraded" status
  • Monitor the error count trend - a sudden spike often indicates a problem
  • Use the 24-hour uptime metric to track historical reliability

Related: Setting Up Custom Alert Rules

Was this article helpful?

Still need help? Contact support

Searching...
Search across meetings, recordings, and participants
Press ESC to close