Troubleshooting guide¶
This page goes over some recommended tools and approaches to troubleshooting the charm.
Check model’s status¶
The first step of any troubleshooting should always be analysing the juju status
command output:
juju status --storage --integrations
In the command above:
--storage
– to include storage information into the output--integrations
– to include relations/integrations information into the output
Check logs¶
Log messages can provide more information, than juju status
, especially on dynamic or fast-paced processes.
Logs are written inside machines, operated by Juju. You can access them directly, via Juju client, or via Canonical Observability Stack.
Juju client¶
Check for logs via Juju:
juju debug-log --replay --tail
You can increase verbosity level if needed:
juju model-config 'logging-config=<root>=INFO;unit=DEBUG'
Direct access¶
Workload logs are stored on units and can be accessed in the following directories:
Apache Kafka –
/var/snap/charmed-kafka/common/var/log/kafka/
Apache ZooKeeper –
/var/snap/charmed-zookeeper/common/var/log/zookeeper/
Use juju ssh
command to connect to a unit and access logs directly, for example:
juju ssh <unit-name> 'sudo tail -f /var/snap/charmed-kafka/common/var/log/kafka/server.log'
Some of the most useful log files for Apache Kafka:
server.log
– The actual service logs.kafka-authorizer.log
– Failed SASL authentications + denied ACL operations.controller.log
– Logs from theKafkaController .kafkaServer-gc.log
– Apache Kafka’s Java garbage collector log.state-change.log
– Tracks partition leader re-elections log.
COS¶
Canonical Observability Stack (COS) gathers, processes, visualises, and alerts on telemetry generated by workloads. In COS, Grafana Loki is the storage and querying backend for logs. You can query Loki to obtain logs via HTTP API or visualise logs in Loki in Grafana and use LogQL - a log query language.
Partition rebalancing¶
Warning
Scaling a Charmed Apache Kafka cluster does not automatically rebalance existing topics and partitions. Rebalancing must be performed manually—before scaling in or after scaling out.
See the Partition reassignment section of the How-to manage units guide for details on how to rebalance Apache Kafka partitions between units.
Run out of disk space¶
At this moment, Juju does not support increasing the size of a storage in a Juju application. If you encounter a problem with size limit of existing storage, consider adding a new storage or redeploying the charm with a bigger storage.
Sizing guide¶
We recommend the following minimum configuration for production environments:
For a single region/AZ deployment:
3
units of Apache Kafka3
units of Apache ZooKeeper
For a multi-region/AZ deployment:
3
units of Apache Kafka5
units of Apache ZooKeeper
For very high numbers of topics, partitions and brokers, consider scaling the Apache ZooKeeper cluster out to 5
or up to 7
units.
For brokers, start with a minimum estimated number of Charmed Apache Kafka units, scaling out the number of units to meet desired throughput.
Warning
Scaling an Apache ZooKeeper cluster out to higher numbers does not provide linear growth in performance. Due to cross communication overhead, there is a point, after which, more nodes means less performance. There is no specific algorithm to calculate optimal numbers, as they are dependant on multiple parameters, including workload characteristics.
Contact us¶
If you encountered an undocumented or unexpected behaviour of the Charmed Apache Kafka, feel free to create an issue on GitHub or contact us directly. See the (reference-contact) page for contact details.