Integrate with Canonical Observability Stack

This how-to guide provides instructions for integrating Charmed HPC with the Canonical Observability Stack (COS). This integration enables you to monitor your deployed Charmed HPC cluster by forwarding collected logs and metrics from your cluster’s services to COS for interactive analysis.

New to COS?

If you’re unfamiliar with operating COS, see the COS tutorials for a high-level introduction to the Canonical Observability Stack.

Prerequisites

To integrate Charmed HPC with COS, you will need:

Ingress enabled

Your COS deployment must have ingress enabled. If your COS deployment does not have ingress enabled, Charmed HPC will be unable to forward collected logs and metrics as COS will be unreachable over the network.

Before you begin

The instructions below assume that Charmed HPC and COS have their own, individual controllers, and that they are connected together with cross-model integration endpoints.

The instructions below also assume that the name of the COS controller is cos-controller, and that the model holding your COS deployment is named cos. If the name of your COS controller is not cos-controller, or the name of your model is not cos, substitute cos-controller and cos with the names of your COS controller and model in the commands below.

Deploy OpenTelemetry Collector

First, use juju deploy to deploy OpenTelemetry Collector in the slurm model on your charmed-hpc machine cloud:

juju deploy opentelemetry-collector \
  --channel 2/stable \
  --base "ubuntu@24.04"
Tip: Switching between Juju models

juju switch can be used to determine the current model you’re operating on:

user@host:~$
juju switch
charmed-hpc-controller:admin/slurm

The output above shows that you’re operating on the slurm model as the admin user through the Juju controller charmed-hpc-controller.

juju switch <model> can also be used to switch between models:

user@host:~$
juju switch identity
charmed-hpc-controller:admin/slurm -> charmed-hpc-controller:admin/identity

The output above shows that you’ve switched from operating on your slurm model to your identity model.

Integrate OpenTelemetry Collector with Charmed HPC

Next, use juju integrate to integrate OpenTelemetry Collector with your Charmed HPC cluster’s applications:

juju integrate opentelemetry-collector slurmctld
juju integrate opentelemetry-collector sackd
juju integrate opentelemetry-collector slurmd

OpenTelemetry Collector will install itself on each unit of the slurmctld application to collect logs and metrics from the slurmctld service’s metrics endpoint. It will also scrape the metrics endpoints provided by the sackd and slurmd applications.

Test connectivity between Charmed HPC and COS

Now ensure that your Charmed HPC cluster can communicate with your COS deployment.

To check if your Charmed HPC cluster can communicate with COS, first grab the URLs of the COS services from Catalogue using the juju show-unit command below:

juju show-unit --model cos-controller:cos catalogue/0 --format json | \
  jq '.[]."relation-info".[]."application-data".url | select (. != null)'

The piped output of the juju show-unit will be similar to the following:

user@host:~$
juju show-unit --model cos-controller:cos catalogue/0 --format json | \ >   jq '.[]."relation-info".[]."application-data".url | select (. != null)'
"http://10.190.89.230/cos-grafana"
"http://10.190.89.230/cos-prometheus-0"
"http://10.190.89.230/cos-alertmanager"

Save these URLs as they will be used later in the Access monitoring resources section.

Next, to verify that your Charmed HPC cluster can communicate with COS, access Prometheus with curl using the juju exec command below:

juju exec --unit opentelemetry-collector/0 -- \
  curl -s http://10.190.89.230/cos-prometheus-0/api/v1/status/runtimeinfo

If the output of juju exec looks similar to the success message below, this means that your Charmed HPC cluster can communicate with COS:

user@host:~$
juju exec --unit opentelemetry-collector/0 -- \ >   curl -s http://10.190.89.230/cos-prometheus-0/api/v1/status/runtimeinfo
{
  "status": "success",
  "data": {
    "startTime": "2025-02-06T19:09:05.141616388Z",
    "CWD": "/",
    "reloadConfigSuccess": true,
    "lastConfigTime": "2025-02-06T19:10:36Z",
    "corruptionCount": 0,
    "goroutineCount": 56,
    "GOMAXPROCS": 8,
    "GOMEMLIMIT": 9223372036854776000,
    "GOGC": "",
    "GODEBUG": "",
    "storageRetention": "15d or 819MiB204KiB819B"
  }
}

Integrate OpenTelemetry Collector with COS

Next, use juju offer create offers for COS from your cos model:

juju switch cos-controller:cos
juju offer cos.grafana:grafana-dashboard grafana-dashboards
juju offer cos.loki:logging loki-logging
juju offer cos.prometheus:receive-remote-write prometheus-receive-remote-write

After that, use juju consume to consume the offers in your slurm model:

juju switch charmed-hpc-controller:slurm
juju consume cos-controller:cos.prometheus-receive-remote-write
juju consume cos-controller:cos.grafana-dashboards
juju consume cos-controller:cos.loki-logging

Now use juju integrate in your slurm model to integrate OpenTelemetry Collector with the COS offer endpoints:

juju integrate opentelemetry-collector prometheus-receive-remote-write
juju integrate opentelemetry-collector loki-logging
juju integrate opentelemetry-collector grafana-dashboards

You can now use the URLs from the Test connectivity between Charmed HPC and COS section to access monitoring resources such as metrics, logs, and alerts collected from your Charmed HPC cluster.

Access monitoring resources

First, use the get-admin-password action to retrieve the Grafana admin password:

juju run grafana/leader \
  --model cos-controller:cos \
  --wait 1m \
  get-admin-password

About the admin password

The get-admin-password action returns the initial admin password that is generated when COS is first deployed. The action will return a notice if the initial admin password has been changed by your COS deployment’s administrator. If the password has been changed, you will need to either create a Grafana account in COS or get the admin password from your COS deployment’s administrator.

Next, open your browser and navigate to the Grafana dashboard URL you saved after completing the Test connectivity between Charmed HPC and COS section.

Log in as the user admin using the password returned by the get-admin-password action. You can see the available dashboards by opening the sidebar menu and clicking on Dashboards.

Next steps

You can now use COS to monitor your Charmed HPC cluster.

You can also start exploring the Grafana dashboards, Loki logs, and Prometheus metrics sections in the Monitoring section for an overview of all the metrics, logs, and dashboards that are provided by your Charmed HPC cluster.