Monitoring

Overall Health

All Systems

Operational

Error Rate

0.04%

12% vs yesterday
P99 Latency

142ms

8ms vs yesterday
Throughput

12.4k req/s

15% vs yesterday

Request Rate

Requests per second over time

04008001k2k00:0002:0004:0006:0008:0010:0012:0014:0016:0018:0020:0022:001,450 req/s

Peak: 1,450 req/s at 16:00

Error Rate

Percentage of failed requests

0%0.04%0.08%0.12%0.16%00:0004:0008:0012:0016:0020:00Alert threshold

Service Health

analytics-svcMaintenance 🔧
Uptime: 99.90%
Error Rate: 0.08%
P99: 312ms
Req/s: 340
Last Incident: Ongoing
api-gatewayHealthy
Uptime: 99.99%
Error Rate: 0.02%
P99: 89ms
Req/s: 4.2k
Last Incident: None
auth-serviceDegraded
Uptime: 99.95%
Error Rate: 0.12%
P99: 234ms
Req/s: 3.4k
Last Incident: 2 hours ago
media-serviceHealthy
Uptime: 99.96%
Error Rate: 0.05%
P99: 203ms
Req/s: 560
Last Incident: 1 day ago
notification-svcHealthy
Uptime: 99.97%
Error Rate: 0.04%
P99: 67ms
Req/s: 890
Last Incident: 5 days ago
payment-apiHealthy
Uptime: 99.99%
Error Rate: 0.01%
P99: 145ms
Req/s: 1.8k
Last Incident: None
search-apiHealthy
Uptime: 99.99%
Error Rate: 0.02%
P99: 156ms
Req/s: 1.2k
Last Incident: None
user-serviceHealthy
Uptime: 99.98%
Error Rate: 0.03%
P99: 112ms
Req/s: 2.1k
Last Incident: 3 days ago

DORA Metrics

Deployment Frequency

Elite

15.2 / day

YouIndustry avg

Lead Time for Changes

Elite

2.4 hours

YouIndustry avg

Change Failure Rate

Elite

1.8%

YouIndustry avg

Mean Time to Recovery

High

12 min

YouIndustry avg

Recent Alerts

auth-service: P99 latency exceeded 200ms

2h agoResolved

analytics-svc: Scheduled maintenance started

1h agoActive

payment-api: Error rate spike to 0.15%

6h agoResolved

search-api: Disk usage at 78%

12h agoAcknowledged

user-service: Connection pool exhausted

3d agoResolved