Tier OpenTelemetry Collector with different pipelines per data stream

By design, charmed OpenTelemetry Collector (otelcol) forwards all receivers to all exporters. For this reason, in order to mimic the wide range of architectures that the pipeline config supports, multiple OpenTelemetry Collector charms may need to be deployed in a tiered topology. One such use case is for processing data differently per receiver or exporter.

Tiering outgoing data streams

One imaginable scenario is splitting a log stream into hot and cold data based on log levels. For compliance reasons we may also want to implement a redaction processor for removing sensitive data. Additionally, the batch processor improves efficiency of both log streams via compression. Low-severity levels like TRACE, DEBUG and INFO often have a greater frequency in log streams and indicate normal workload operation. This can be filtered out in a log stream which is sent to long-term (cold) storage to minimize cost while maintaining compliance. Conversely, the hot storage could include INFO logs, since storage is short-term, while still filtering out TRACE and DEBUG logs.

To understand how to filter telemetry with otelcol, refer to the selectively drop telemetry documentation or see the examples for log-level filtering.

        flowchart TB

flog[flog] --> fan-out
fan-out["opentelemetry-collector<br>(redact & batch)"]
fan-out --> warn
fan-out --> info
warn["opentelemetry-collector<br>(cold filter)"] --> loki-cold
info["opentelemetry-collector<br>(hot filter)"] --> loki-hot
loki-hot["loki<br>(hot storage)"]
loki-cold["loki<br>(cold storage)"]

class fan-out,warn,info thickStroke;
classDef thickStroke stroke-width:2px, stroke:#FFA500;
    

With Juju config we use the otelcol processor config to:

  1. Set the minimum severity level to WARNING

  cold-filter:
    options:
      processors: |-
        filter:
          logs:
            log_record:
              - ContainsValue(Keys(ParseJSON(body)), "level") and
                (ParseJSON(body)["level"] == "INFO" or
                ParseJSON(body)["level"] == "DEBUG" or
                ParseJSON(body)["level"] == "TRACE")
  1. Set the minimum severity level to INFO

  hot-filter:
    options:
      processors: |-
        filter:
          logs:
            log_record:
              - ContainsValue(Keys(ParseJSON(body)), "level") and
                (ParseJSON(body)["level"] == "DEBUG" or
                ParseJSON(body)["level"] == "TRACE")
  1. Redact sensitive log messages and batch

  redact-and-batch:
    options:
      processors: |
        batch:
        redaction:
          blocked_values:
            - "(dolorem|facilis|quo) .* (corporis|debitis|quis)"

Tiering incoming data streams

Another imaginable scenario is classifying log streams prior to ingestion into a common storage destination. Each flog log source has unique downstream data processing, useful for environment classification and identification. Both data streams benefit from the redact & batch otelcol using the redaction processor for compliance reasons and the batch processor for efficiency. Additionally, they have an attributes processor, uniquely configured, to classify the logging source environment.

        flowchart TB

flog-dev["flog<br>(dev)"] --> dev
dev["opentelemetry-collector<br>(dev attributes)"] --> fan-in

flog-prod["flog<br>(prod)"] --> prod
prod["opentelemetry-collector<br>(prod attributes)"] --> fan-in

fan-in["opentelemetry-collector<br>(redact & batch)"] --> loki[loki]

class fan-in,dev,prod thickStroke;
classDef thickStroke stroke-width:2px, stroke:#FFA500;
    

With Juju config we use the otelcol processor config to:

  1. Label the log stream as development and originating from region-a

  dev-attributes:
    options:
      processors: |-
        attributes/dev:
          actions:
            - key: "region-a.environment"
              value: "dev"
              action: upsert
  1. Label the log stream as production and originating from region-a

  prod-attributes:
    options:
      processors: |-
        attributes/prod:
          actions:
            - key: "region-a.environment"
              value: "prod"
              action: upsert
  1. Redact sensitive log messages and batch

  redact-and-batch:
    options:
      processors: |-
        batch:
        redaction:
          allow_all_keys: true
          blocked_values:
            - "(dolorem|facilis|quo) .* (corporis|debitis|quis)"