Stack variants¶
Quick comparison¶
COS |
COS Lite |
|
---|---|---|
Purpose |
Horizontally scalable, enterprise-ready |
Resource-constrained or near-edge deployment |
Telemetry types |
Logs, metrics, traces |
Logs, metrics |
Resiliency |
Scalable microservices with node anti-affinity (HA-ready) |
Monolithic mode (multi-node non-identical replication) |
Storage |
S3 (managed independently) |
Via PVCs, e.g. |
Load balancing |
Dedicated Nginx for Loki, Mimir, Tempo; Traefik on top of that |
Traefik balances across units externally, but separate grafana datasources for loki/prometheus units |
Self-monitoring |
Logs, metrics, traces via in-model opentelemetry-collector charm |
Metrics only, via direct relations |
Minimum system requirements |
1x 8cpu16gb + storage nodes |
1x 4cpu8gb (+storage nodes, if any) |
COS Lite¶
Canonical Observability Stack Lite (referred to as COS Lite) is designed for the edge and is capable of running reliably alongside MicroK8s and Juju with limited computing resources (around 8 GB of memory).
The charms in COS Lite can be integrated with other Juju-managed applications to provide a turn-key observability solution for your charmed workloads.
In addition, the Canonical Observability Stack is capable, through the use of Grafana Agent, to observe applications running outside of the Juju context as well.
See this article for details.
Components¶
COS Lite consists of two types of components: core and auxiliary, where core are needed for the solution to function, while the auxiliary ones may be added to enhance the functionality as needed.
Core¶
Prometheus¶
The Prometheus charmed operator is responsible for ingesting, storing, evaluating and serving metric telemetry. While it has a separate user interface, the main way of consumption is through Grafana.
Loki¶
The Loki charmed operator is responsible for ingesting, storing, evaluating and serving log telemetry. The main way of consumption is through the Grafana user interface.
Alertmanager¶
The Alertmanager charmed operator is responsible for routing alert notifications on to relevant recipients based on the alert rules expressed in Prometheus and Loki.
Grafana¶
The Grafana charmed operator provides a highly customizable and flexible way of visualizing and consuming the telemetry data generated by your workloads.
Traefik¶
The Traefik charmed operator provides a way to ingress applications running on Kubernetes. In the Canonical Observability Stack, it is used to provide access to the stack both for users, and for Grafana Agents that want to send over telemetry.
Catalogue¶
Collectors¶
Grafana Agent for Kubernetes¶
The Grafana Agent charmed operator for Kubernetes provides a way to fetch and forward telemetry, alert rules, and dashboards via Juju relations, and send them over to the observability stack.
Grafana Agent for Machines¶
The Grafana Agent charmed operator for Machines provides a way to fetch and forward telemetry, alert rules, and dashboards via cross-model Juju relations, and send them over to the observability stack.
Optional¶
Optional components are components that you may use to enhance your observability stack based on specific use cases or circumstances. They are not needed for the stack to work.
Prometheus Scrape Config¶
The Prometheus Scrape Config charmed operator allows you to tweak the settings of scrape jobs by placing it between a scraper like Grafana Agent and the charm you want to scrape.
Prometheus Scrape Target¶
The Prometheus Scrape Target charmed operator allows you to scrape targets not managed by Juju
by making their /metrics
endpoints part of the Juju state.
COS Proxy¶
The COS Proxy charmed operator is a charm for virtual and physical machines designed to “translate” the relations supported by the previous iteration, LMA, to COS native ones.
COS Configuration¶
The COS Configuration charmed operator provides a GitOps approach to manage Prometheus alerts, Loki alerts and Grafana dashboards that are specific to your Juju deployments, rather than to a particular charm.
Karma¶
The Karma charmed operator enables you to visualize alerts from various Alertmanager clusters in a unified fashion, e.g., if you were to deploy many observability stacks on separate edge compute devices, or in different production environments, and want to keep a centralized overview.