Prometheus SLIs¶
This page documents Service Level Indicators (SLIs) for monitoring the health of Prometheus. To set up Service Level Objectives (SLOs), see Set up SLOs with Sloth.
These metrics are recommended as Service Level Indicators for Prometheus.
Query performance¶
Metric |
Type |
Description |
|---|---|---|
|
Summary |
Query execution time by slice (inner_eval, prepare_time, queue_time, result_sort) |
|
Gauge |
Number of currently executing or waiting queries |
|
Gauge |
Maximum concurrent queries allowed |
|
Counter |
Total samples loaded by all queries |
HTTP API¶
Metric |
Type |
Description |
|---|---|---|
|
Histogram |
HTTP request latency by handler |
|
Counter |
HTTP requests by handler and status code |
|
Histogram |
HTTP response size by handler |
Scrape health¶
Metric |
Type |
Description |
|---|---|---|
|
Gauge |
Target reachability (1 = up, 0 = down) |
|
Gauge |
Duration of the last scrape per target |
|
Gauge |
Number of samples scraped per target |
|
Summary |
Actual interval between scrapes |
Rule evaluation¶
Metric |
Type |
Description |
|---|---|---|
|
Counter |
Total rule evaluations per rule group |
|
Counter |
Failed rule evaluations per rule group |
|
Summary |
Rule evaluation duration |
|
Counter |
Total scheduled rule group evaluations |
|
Counter |
Missed rule group evaluations due to slow evaluation |
|
Summary |
Rule group evaluation duration |
Alert notifications¶
Metric |
Type |
Description |
|---|---|---|
|
Counter |
Alerts sent to Alertmanager |
|
Counter |
Alerts dropped due to send errors |
|
Counter |
Alerts affected by errors |
|
Gauge |
Alerts in queue per Alertmanager |
|
Summary |
Alert notification send latency |
Storage (TSDB)¶
Metric |
Type |
Description |
|---|---|---|
|
Gauge |
Number of active time series |
|
Gauge |
Number of chunks in the head block |
|
Histogram |
Time spent in compactions |
|
Counter |
WAL corruption events (should be 0) |
|
Gauge |
Storage used by head block |