Troubleshooting guide

This page goes over some recommended tools and approaches to troubleshooting the charm.

Check model’s status

The first step of any troubleshooting should always be analysing the juju status command output:

juju status --storage --integrations

In the command above:

  • --storage – to include storage information into the output

  • --integrations – to include relations/integrations information into the output

Note

Learn more:

Check logs

Log messages can provide more information, than juju status, especially on dynamic or fast-paced processes. Logs are written inside machines, operated by Juju. You can access them directly, via Juju client, or via Canonical Observability Stack.

Juju client

Check for logs via Juju:

juju debug-log --replay --tail

You can increase verbosity level if needed:

juju model-config 'logging-config=<root>=INFO;unit=DEBUG'

Note

Learn more:

Direct access

Workload logs are stored on units and can be accessed in the following directories:

  • Apache Kafka – /var/snap/charmed-kafka/common/var/log/kafka/

  • Apache ZooKeeper – /var/snap/charmed-zookeeper/common/var/log/zookeeper/

Use juju ssh command to connect to a unit and access logs directly, for example:

juju ssh <unit-name> 'sudo tail -f /var/snap/charmed-kafka/common/var/log/kafka/server.log'

Some of the most useful log files for Apache Kafka:

  • server.log – The actual service logs.

  • kafka-authorizer.log – Failed SASL authentications + denied ACL operations.

  • controller.log – Logs from the KafkaController.

  • kafkaServer-gc.log – Apache Kafka’s Java garbage collector log.

  • state-change.log – Tracks partition leader re-elections log.

Note

Learn more:

COS

Canonical Observability Stack (COS) gathers, processes, visualises, and alerts on telemetry generated by workloads. In COS, Grafana Loki is the storage and querying backend for logs. You can query Loki to obtain logs via HTTP API or visualise logs in Loki in Grafana and use LogQL - a log query language.

Partition rebalancing

Warning

Scaling a Charmed Apache Kafka cluster does not automatically rebalance existing topics and partitions. Rebalancing must be performed manually—before scaling in or after scaling out.

See the Partition reassignment section of the How-to manage units guide for details on how to rebalance Apache Kafka partitions between units.

Run out of disk space

At this moment, Juju does not support increasing the size of a storage in a Juju application. If you encounter a problem with size limit of existing storage, consider adding a new storage or redeploying the charm with a bigger storage.

Sizing guide

We recommend the following minimum configuration for production environments:

  • For a single region/AZ deployment:

    • 3 units of Apache Kafka

    • 3 units of Apache ZooKeeper

  • For a multi-region/AZ deployment:

    • 3 units of Apache Kafka

    • 5 units of Apache ZooKeeper

For very high numbers of topics, partitions and brokers, consider scaling the Apache ZooKeeper cluster out to 5 or up to 7 units. For brokers, start with a minimum estimated number of Charmed Apache Kafka units, scaling out the number of units to meet desired throughput.

Warning

Scaling an Apache ZooKeeper cluster out to higher numbers does not provide linear growth in performance. Due to cross communication overhead, there is a point, after which, more nodes means less performance. There is no specific algorithm to calculate optimal numbers, as they are dependant on multiple parameters, including workload characteristics.

Note

Learn more:

Contact us

If you encountered an undocumented or unexpected behaviour of the Charmed Apache Kafka, feel free to create an issue on GitHub or contact us directly. See the (reference-contact) page for contact details.