Key rotation

A Charmed HPC deployment uses two keys: the internal authentication key, to ensure communications “received by Slurm are coming from trusted sources”, and a standalone JWT key used for API token generation.

Both keys are generated by the slurmctld charm during installation and are securely distributed to related Slurm using Juju Secrets. The authentication key is distributed to charms: sackd, slurmd, slurmdbd, and slurmrestd, while the JWT key is distributed to only slurmdbd.

The slurmctld application is the owner of the key charm secrets while the related Slurm charms are secret observers.

It is good security practice to replace keys regularly or in response to security incidents to ensure malicious users cannot use a leaked key to exert control over the cluster.

Authentication key rotation process

The Rotate the Slurm authentication key how-to provides the steps to initiate the key rotation process, using the rotate-auth-key action. The action triggers a cluster-wide rotation of the authentication key. Once the process is complete, a new key is in place and the previous key can no longer be used to authenticate Slurm communication.

The full rotation process is as follows:

  • A Charmed HPC cluster administrator runs juju run slurmctld/leader rotate-auth-key

  • The slurmctld leader generates a new cryptographically secure key and adds it to its slurm.jwks file alongside the old key. To ensure this key has a unique identifier, its key ID, or kid, value is set to a newly generated Universally Unique Identifier (UUID).

  • With the new key in place, the slurmctld leader performs an scontrol reconfigure to reload configuration across the cluster. At this point, both the new and old keys are valid and can be used for authentication.

  • The slurmctld leader updates the Juju secret authentication key to a new revision with the new key.

  • The revision update triggers a SecretChangedEvent in each unit of the related Slurm charms (the secret observers).

  • In the SecretChangedEvent, units read the new revision of the Juju secret then replace their local key file with the new key and reload their service configuration. This ensures the new key is in use and the old key is no longer valid on that unit.

  • Once all units of related Slurm charms have updated to the new key, the slurmctld leader receives a SecretRemoveEvent.

  • In the SecretRemoveEvent, the slurmctld leader removes the old key from the slurm.jwks file and performs another scontrol reconfigure to ensure the old key is no longer valid in the cluster and only the new key is trusted for future authentication.

JWT key rotation process

The Rotate the JWT authentication key how-to provides the steps to initiate the key rotation process, using the rotate-jwt-key action. The action triggers a rotation of the JWT key on both the slurmctld controller and slurmdbd database units. Once the process is complete, a new key is in place and the previous key can no longer be used to generate valid tokens.

The full rotation process is as follows:

  • A Charmed HPC cluster administrator runs juju run slurmctld/leader rotate-jwt-key

  • The slurmctld leader generates a new cryptographically secure key and replaces its existing jwt_hs256.key.

  • With the new key in place, the slurmctld leader performs an scontrol reconfigure to reload configuration across the cluster. slurmdbd is now using a stale key and its API endpoints are inaccessible.

  • The slurmctld leader updates the Juju secret JWT key to a new revision with the new key.

  • The revision update triggers a SecretChangedEvent in the slurmdbd unit.

  • In the SecretChangedEvent, the slurmdbd unit reads the new revision of the Juju secret then replaces its local key file with the new key and reloads its service configuration. This ensures the new key is in use and the old key is no longer valid.

  • The API endpoints for slurmdbd are now accessible again.