Restore¶
This guide describes how to restore Charmed Kubeflow (CKF) control plane data and user workloads from the object storage using the Charmed Velero.
Warning
These steps are expected to be followed simultaneously, restoring all databases, pipelines, the MinIO bucket, and the ML Metadata database. Failing to do so may result in data loss.
Note
Running Kubeflow pipelines and Katib experiments can affect the restoration outcome. Please make sure all pipelines and experiments are stopped and no other processes, such as Jupyter Notebooks, are calling them.
Note
Full backups with volume snapshotting are only supported on AWS and Azure Kubernetes Clusters. You might opt for the File System Backup for Persistent Volume Claims, but depending on the Storage Class, it might not work.
Requirements¶
Admin access to the Kubernetes cluster where CKF is deployed.
Juju admin access to the
kubeflowmodel.Charmed Velero was deployed and configured with the object storage where the backups were stored.
katib-dbandkfp-dbapplications were backed up to S3-compatible storage.
List backups¶
In the Charmed Velero model, run the list-backups action to get the backups:
juju switch velero
juju run velero-operator/0 list-backups
The action returns a YAML list of backups. Please note the application name, endpoint, and backup UID. You will use the UID to make a restore.
In the Kubeflow model, run the following commands to list the database backups:
juju switch kubeflow
juju run katib-db/leader list-backups
juju run kfp-db/leader list-backups
Prepare for restore¶
Prepare CKF for the restore. The ID of a storage for each application can be retrieved by running juju storage:
juju scale-application minio 0
juju scale-application mlmd 0
juju remove-storage mlmd-data/<id>
juju remove-storage minio-data/<id>
Warning
Removing the storage from the minio and mlmd charms may result in data loss. Make sure nothing important is stored in the databases. This can be safely done on the clean CKF install.
Make a restore¶
Switch to the Charmed Velero model and initiate restores using the backup UIDs from the previous steps:
Note
Make sure the following order of execution is preserved.
After each restore, you need to re-attach the recreated storage for minio and mlmd charms by getting the names of the restored PVs using kubectl get pv:
kubectl get pv
juju import-filesystem kubernetes <pv_name> minio-data --force
juju import-filesystem kubernetes <pv_name> mlmd-data --force
juju add-unit minio --attach-storage minio-data/<id>
juju add-unit mlmd --attach-storage mlmd-data/<id>
The restore is now complete. Open the dashboard to see the backed-up data.
Please refer to the Charmed Velero documentation <https://charmhub.io/velero-operator> for more details.
Restore CKF databases¶
Scale up
kfp-dbandkatib-db.
This step avoids the Primary database from becoming unavailable during backup:
juju scale-application kfp-db 2
juju scale-application katib-db 2
Restore each database.
juju run kfp-db/leader restore restore-to-time=latest
juju run katib-db/leader restore restore-to-time=latest
Please refer to the Charmed MySQL K8s documentation for more details.