Stack variants

Quick comparison

COS

COS Lite

Purpose

Horizontally scalable, enterprise-ready

Resource-constrained or near-edge deployment

Telemetry types

Logs, metrics, traces

Logs, metrics

Resiliency

Scalable microservices with node anti-affinity (HA-ready)

Monolithic mode (multi-node non-identical replication)

Storage

S3 (managed independently)

Via PVCs, e.g. ceph-csi (managed independently)

Load balancing

Dedicated Nginx for Loki, Mimir, Tempo; Traefik on top of that

Traefik balances across units externally, but separate grafana datasources for loki/prometheus units

Self-monitoring

Logs, metrics, traces via in-model opentelemetry-collector charm

Metrics only, via direct relations

Minimum system requirements

1x 8cpu16gb + storage nodes

1x 4cpu8gb (+storage nodes, if any)

COS Lite

Canonical Observability Stack Lite (referred to as COS Lite) is designed for the edge and is capable of running reliably alongside MicroK8s and Juju with limited computing resources (around 8 GB of memory).

The charms in COS Lite can be integrated with other Juju-managed applications to provide a turn-key observability solution for your charmed workloads.

In addition, the Canonical Observability Stack is capable, through the use of Grafana Agent, to observe applications running outside of the Juju context as well.

See this article for details.

Components

COS Lite consists of two types of components: core and auxiliary, where core are needed for the solution to function, while the auxiliary ones may be added to enhance the functionality as needed.

Core

Prometheus

The Prometheus charmed operator is responsible for ingesting, storing, evaluating and serving metric telemetry. While it has a separate user interface, the main way of consumption is through Grafana.

Loki

The Loki charmed operator is responsible for ingesting, storing, evaluating and serving log telemetry. The main way of consumption is through the Grafana user interface.

Alertmanager

The Alertmanager charmed operator is responsible for routing alert notifications on to relevant recipients based on the alert rules expressed in Prometheus and Loki.

Grafana

The Grafana charmed operator provides a highly customizable and flexible way of visualizing and consuming the telemetry data generated by your workloads.

Traefik

The Traefik charmed operator provides a way to ingress applications running on Kubernetes. In the Canonical Observability Stack, it is used to provide access to the stack both for users, and for Grafana Agents that want to send over telemetry.

Catalogue

Collectors

Grafana Agent for Kubernetes

The Grafana Agent charmed operator for Kubernetes provides a way to fetch and forward telemetry, alert rules, and dashboards via Juju relations, and send them over to the observability stack.

Grafana Agent for Machines

The Grafana Agent charmed operator for Machines provides a way to fetch and forward telemetry, alert rules, and dashboards via cross-model Juju relations, and send them over to the observability stack.

Optional

Optional components are components that you may use to enhance your observability stack based on specific use cases or circumstances. They are not needed for the stack to work.

Prometheus Scrape Config

The Prometheus Scrape Config charmed operator allows you to tweak the settings of scrape jobs by placing it between a scraper like Grafana Agent and the charm you want to scrape.

Prometheus Scrape Target

The Prometheus Scrape Target charmed operator allows you to scrape targets not managed by Juju by making their /metrics endpoints part of the Juju state.

COS Proxy

The COS Proxy charmed operator is a charm for virtual and physical machines designed to “translate” the relations supported by the previous iteration, LMA, to COS native ones.

COS Configuration

The COS Configuration charmed operator provides a GitOps approach to manage Prometheus alerts, Loki alerts and Grafana dashboards that are specific to your Juju deployments, rather than to a particular charm.

Karma

The Karma charmed operator enables you to visualize alerts from various Alertmanager clusters in a unified fashion, e.g., if you were to deploy many observability stacks on separate edge compute devices, or in different production environments, and want to keep a centralized overview.