Telemetry labels

An application produces telemetry (metrics, logs, traces, profiles) which the observability stack collects and analyzes to surface issues and abnormalities. Telemetry coming from multiple sources (which could be on different nodes, or even infrastructure) is stored in a centralized database, therefore we need to be able to map the telemetry back to its origin. This is the goal of telemetry labels.

A telemetry label is a key-value pair. Telemetry labels can be specified:

  • at generation time: the instrumentation can attach the labels to the produced telemetry

  • at scrape time: the scrape jobs can be configured to label the scraped telemetry by means of “scrape configs”

Telemetry labels are used throughout the Grafana ecosystem to uniquely identify the source of a piece of data.

Metric labels

By convention, applications expose labeled metrics under a /metrics endpoint. For example, you can run the prometheus application and curl its :9090/metrics endpoint to obtain the metrics exposed by the process.

$ sudo snap install prometheus

$ curl localhost:9090/metrics

# -- snip --

# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 14

# -- snip --

# HELP prometheus_http_requests_total Counter of HTTP requests.
# TYPE prometheus_http_requests_total counter
prometheus_http_requests_total{code="200",handler="/metrics"} 128
prometheus_http_requests_total{code="302",handler="/"} 1

# ...

In the example above,

  • process_open_fds is a metric without any labels

  • prometheus_http_requests_total is a metric with two labels:

    • code: a label that tells you the status code of a handled request

    • handler: a label that tells you the path of the endpoint handling an HTTP request

Scrape job labels for metrics

While metric labels are set by the app developer, each scrape job configured on the monitoring service can append an additional fixed set of labels to all the metrics it collects. Prometheus and grafana agent are two examples of monitoring services capable of scraping metrics.

For prometheus (or grafana agent) to scrape our apps (targets), we need to specify in its configuration file where to find them. This is also where we specify telemetry labels.

scrape_configs:
  - job_name: "some-app-scrape-job"
    metrics_path: "/metrics"
    static_configs:
      - targets: ["hostname.for.my.app:8080"]
        labels:
          location: "second_floor_third_server_from_the_left"
          purpose: "weather_station_cluster"

Labels that are specified under a static_configs entry are automatically attached to all metrics scraped from the targets:

$ curl -s --data-urlencode 'match[]={__name__="prometheus_http_requests_total"}' localhost:9090/api/v1/series | jq '.data'
[
  {
    "__name__": "prometheus_http_requests_total",
    "code": "200",
    "handler": "/metrics",
    "instance": "localhost:9090",
    "job": "prometheus",
    "location": "second_floor_third_server_from_the_left",
    "purpose": "weather_station_cluster"
  },
  {
    "__name__": "prometheus_http_requests_total",
    "code": "302",
    "handler": "/",
    "instance": "localhost:9090",
    "job": "prometheus",
    "location": "second_floor_third_server_from_the_left",
    "purpose": "weather_station_cluster"
  },
]

Similarly, “service labels” can be specified using prometheus remote-write endpoint and push-gateway, and grafana agent’s config file.

Log labels

Logs (“streams”) ingested by Loki will be searchable by the specified labels. If you push logs directly to Loki, you can attach labels to every “stream” pushed. In Loki’s terminology, a stream is a set of log lines pushed in a single request:

{
  "streams": [
    {
      "stream": {
        "label": "value"
      },
      "values": [
          [ "<unix epoch in nanoseconds>", "<log line>" ],
          [ "<unix epoch in nanoseconds>", "<log line>" ]
      ]
    }
  ]
}

all of the labels specified in the stream section above will be applied to all the log lines specified in the values block.

Scrape job labels for logs

Log files can be scraped by Promtail or grafana agent, which then stream the log lines to Loki using Loki’s push-api endpoint. Promtail, similar to grafana agent, has a scarpe_configs section in its config file for specifying targets (log filename) and associate labels to them. See also grafana agent’s config file docs.

Alert labels

By design, prometheus (and Loki) store all alerts in a centralized fashion: if you want your alerts to be evaluated, you must place them on the filesystem somewhere accessible by prometheus, and specify that path in Prometheus’s config file:

rule_files:
  - /path/to/*.rules
  - /another/one/*.yaml

Alert definitions are not tied to any particular node, application or metric. This gives high flexibility in defining an alert. You could define an alert that triggers for any node that runs out of space, and another alert that triggers only for a specific application on a specific node. Narrowing down the scope of an alert is accomplished by using telemetry labels.

  • expr: process_cpu_seconds_total > 0.12 would trigger if the value of any metric with this name (regardless of any labels) exceeds 0.12.

  • expr: process_cpu_seconds_total{region="europe", app="nginx"} > 0.12 would trigger only for this metrics that is also labeled as nginx and europe.

When an on-caller receives an alert (via alertmanager, karma or similar), they see a rendering of the alert, which includes the expr and label values, among a few additional fields.

Additional alert labels can be specified in the alert definition:

      labels:
        severity: critical

This is useful for:

Relabeling

relabel_configs and metric_relabel_configs are for modifying label and metric names, respectively.

See also: