Restore

This guide describes how to restore Charmed Kubeflow (CKF) control plane data and user workloads from the object storage using the Charmed Velero.

Warning

These steps are expected to be followed simultaneously, restoring all databases, pipelines, the MinIO bucket, and the ML Metadata database. Failing to do so may result in data loss.

Note

Running Kubeflow pipelines and Katib experiments can affect the restoration outcome. Please make sure all pipelines and experiments are stopped and no other processes, such as Jupyter Notebooks, are calling them.

Note

Full backups with volume snapshotting are only supported on AWS and Azure Kubernetes Clusters. You might opt for the File System Backup for Persistent Volume Claims, but depending on the Storage Class, it might not work.

Requirements

  • Admin access to the Kubernetes cluster where CKF is deployed.

  • Juju admin access to the kubeflow model.

  • Charmed Velero was deployed and configured with the object storage where the backups were stored.

  • katib-db and kfp-db applications were backed up to S3-compatible storage.

List backups

In the Charmed Velero model, run the list-backups action to get the backups:

juju switch velero
juju run velero-operator/0 list-backups

The action returns a YAML list of backups. Please note the application name, endpoint, and backup UID. You will use the UID to make a restore.

In the Kubeflow model, run the following commands to list the database backups:

juju switch kubeflow
juju run katib-db/leader list-backups
juju run kfp-db/leader list-backups

Prepare for restore

Prepare CKF for the restore. The ID of a storage for each application can be retrieved by running juju storage:

juju scale-application minio 0
juju scale-application mlmd 0
juju remove-storage mlmd-data/<id>
juju remove-storage minio-data/<id>

Warning

Removing the storage from the minio and mlmd charms may result in data loss. Make sure nothing important is stored in the databases. This can be safely done on the clean CKF install.

Make a restore

Switch to the Charmed Velero model and initiate restores using the backup UIDs from the previous steps:

Note

Make sure the following order of execution is preserved.

After each restore, you need to re-attach the recreated storage for minio and mlmd charms by getting the names of the restored PVs using kubectl get pv:

kubectl get pv
juju import-filesystem kubernetes <pv_name> minio-data --force
juju import-filesystem kubernetes <pv_name> mlmd-data --force
juju add-unit minio --attach-storage minio-data/<id>
juju add-unit mlmd --attach-storage mlmd-data/<id>

The restore is now complete. Open the dashboard to see the backed-up data.

Please refer to the Charmed Velero documentation <https://charmhub.io/velero-operator> for more details.

Restore CKF databases

  1. Scale up kfp-db and katib-db.

This step avoids the Primary database from becoming unavailable during backup:

juju scale-application kfp-db 2
juju scale-application katib-db 2
  1. Restore each database.

juju run kfp-db/leader restore restore-to-time=latest
juju run katib-db/leader restore restore-to-time=latest

Please refer to the Charmed MySQL K8s documentation for more details.