Tableau de bord et surveillance Dashboard Surveillance

System Health Overview

Monitor your cluster's overall health status

OdooBot 19/01/2026 434 vues

Overview

The System Health page provides a comprehensive view of your Scalelite cluster's operational status. It monitors services, tracks errors, and calculates an overall health score to help you quickly assess system stability.

Health Score

The health score is a percentage (0-100%) that reflects your cluster's overall operational health.

90%

Health

Production Cluster

Last updated: 2 minutes ago

All Systems Operational

5/5

Servers Online

Errors (1h)

99.8%

Uptime (24h)

45ms

Avg Response

Hybrid Health Score Calculation

The health score uses a hybrid formula combining infrastructure health (70%) and monitoring system health (30%):

Infrastructure Score (70% of total)

Server Availability (40%): Percentage of BBB servers online and responsive
Error Rate (25%): Number of errors in the last hour (0 errors = 100%, <5 = 80%, <10 = 50%)
Response Time (20%): Average API response time from active health probes (<500ms = 100%, <1000ms = 80%, <3000ms = 50%)
Resource Utilization (15%): Peak CPU, memory, and disk usage levels (<70% = 100%, <85% = 70%, <95% = 40%)

Monitoring System Score (30% of total)

Data Ingestion (40%): How recently data was received from webhook reports
Alert System (30%): Alert processing and resolution rate
Notification System (30%): Notification delivery success rate

Health Score Thresholds

90-100%: Healthy - All systems operating normally
70-89%: Warning - Some issues detected, monitoring recommended
Below 70%: Critical - Immediate attention required

Active Health Probing

Scalelite Manager Pro performs active health probes every minute to monitor your infrastructure:

Scalelite API Probing: Measures response time and availability of your Scalelite endpoint
BBB Server Probing: Checks response times, API status, and port accessibility (443, 80) for each server
Response Time Tracking: Historical response time data for trending and alerting

Container Resource Usage

For Docker-based Scalelite deployments, the system monitors container metrics:

Container	CPU %	Memory %	Uptime
scalelite-api	12%	45%	2d 5h
scalelite-poller	5%	28%	2d 5h
redis	2%	15%	2d 5h

Resource usage is color-coded: Green (<70%), Orange (70-85%), Red (>85%)

BBB Server Health

Monitor the health of individual BigBlueButton servers in your pool:

Response Time: API response latency from active probes (updated every minute)
API Status: OK, Error, Timeout, or Unreachable
Port Accessibility: Checks if ports 443 (HTTPS) and 80 (HTTP) are accessible
Last Probe: When the server was last checked

Service Status

Monitor individual services that make up your Scalelite infrastructure:

Scalelite API

Response: 32ms

Redis Cache

Memory: 45%

PostgreSQL

Connections: 12/100

Nginx Proxy

Requests: 1.2k/min

Recording Import

Queue: 5 pending

Poller Service

Last poll: 30s ago

Service Status Indicators

● Online	Service is running normally
● Degraded	Service running with reduced performance or queued work
● Offline	Service is not responding or has failed

Error Tracking

The system health page also tracks recent errors:

Errors Last Hour: Count of errors in the past 60 minutes
Error Trend: Whether errors are increasing or decreasing
Most Common Error: The most frequently occurring error type

Best Practices

Check the System Health page daily or set up alerts for score drops
Investigate any service showing "Degraded" status
Monitor the error count trend - a sudden spike often indicates a problem
Use the 24-hour uptime metric to track historical reliability

Related: Setting Up Custom Alert Rules

Was this article helpful?

Vous avez encore besoin d'aide ? Contacter l'assistance