Enable NVIDIA GPUs¶
This guide describes how to configure Data Science Stack (DSS) to utilise your NVIDIA GPUs.
You can do so by configuring the underlying MicroK8s, on which DSS relies on for running the containerised workloads.
Prerequisites¶
DSS is installed and initialised.
Your machine includes an NVIDIA GPU.
Install the NVIDIA Operator¶
To ensure DSS can utilise NVIDIA GPUs:
The NVIDIA drivers must be installed.
MicroK8s must be set up to utilise NVIDIA drivers.
MicroK8s is leveraging the NVIDIA Operator for setting up and configuring the NVIDIA runtime. The NVIDIA Operator also installs the NVIDIA drivers, if they are not present already on your machine.
To enable the NVIDIA runtimes on MicroK8s, run the following command:
sudo microk8s enable gpu
Note
The NVIDIA Operator detects the installed NVIDIA drivers in your machine. If they are not installed, it will do so automatically.
Verify the NVIDIA Operator is up¶
Before spinning up workloads, the GPU Operator has to be successfully initialised. To do so, you need to assure DaemonSet is ready and the Validator Pod has succeeded.
Note
This process can take approximately 5 minutes to complete.
Ensure DaemonSet is ready¶
First, ensure that the DaemonSet for the Operator Validator is created:
while ! sudo microk8s.kubectl get ds \
-n gpu-operator-resources \
nvidia-operator-validator
do
sleep 5
done
Note
It takes some seconds for the DaemonSet to get created.
The above command returns a message like the following at the beginning of the process:
Error from server (NotFound): daemonsets.apps "nvidia-operator-validator" not found
Once completed, you should expect an output like this:
Error from server (NotFound): daemonsets.apps "nvidia-operator-validator" not found
...
Error from server (NotFound): daemonsets.apps "nvidia-operator-validator" not found
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
nvidia-operator-validator 1 1 0 1 0 nvidia.com/gpu.deploy.operator-validator=true 17s
Ensure the Validator Pod succeeded¶
Next, you need to wait for the Validator Pod to succeed:
echo "Waiting for the NVIDIA Operator validations to complete..."
while ! sudo microk8s.kubectl logs \
-n gpu-operator-resources \
-l app=nvidia-operator-validator \
-c nvidia-operator-validator | grep "all validations are successful"
do
sleep 5
done
Note
It takes some seconds for the Validator Pod to get initialised.
The above command returns a message like the following at the beginning of the process:
Error from server (BadRequest): container "nvidia-operator-validator" in pod "nvidia-operator-validator-4rq5n" is waiting to start: PodInitializing
Once completed, you should expect an output like this:
Error from server (BadRequest): container "nvidia-operator-validator" in pod "nvidia-operator-validator-4rq5n" is waiting to start: PodInitializing
...
Error from server (BadRequest): container "nvidia-operator-validator" in pod "nvidia-operator-validator-4rq5n" is waiting to start: PodInitializing
all validations are successful
Verify DSS detects the GPU¶
At this point, the underlying MicroK8s cluster has been configured for handling the NVIDIA GPU. Verify the DSS CLI has detected the GPU by checking the DSS status as follows:
dss status
You should expect an output like this:
MLflow deployment: Ready
MLflow URL: http://10.152.183.74:5000
GPU acceleration: Enabled (NVIDIA-GeForce-RTX-3070-Ti)
Note
The GPU model NVIDIA-GeForce-RTX-3070-Ti might differ from your setup.
See also¶
To learn how to manage your DSS environment, check Manage DSS.
If you are interested in managing Jupyter Notebooks within your DSS environment, see Manage Jupyter Notebooks.