DSS architecture¶

This guide provides an overview of the Data Science Stack (DSS) architecture, its main components, and their interactions.

DSS is a ready-to-run environment for Machine Learning (ML) and Data Science (DS). It’s built on open-source tooling, including Canonical K8s, JupyterLab, and MLflow.

DSS is distributed as a snap and usable on any Ubuntu workstation. This provides robust security management and user-friendly version control, enabling seamless updates and auto-rollback in case of failure.

Using DSS, you can perform the following tasks:

Installing and managing the DSS Python library.
Deploying and managing Jupyter Notebooks.
Deploying and managing MLflow.
Running GPU workloads.

Architecture overview¶

The DSS architecture can be thought of as a stack of layers. These layers, from top to bottom, include:

Application.
ML tools.
Orchestration.
Operating system (OS).

The following diagram showcases it:

More details on each layer are discussed in the following sections.

Application¶

DSS is a Command Line Interface (CLI)-based tool, accessible from the Ubuntu terminal. See Manage DSS to learn about how to manage your DSS environment and the available CLI commands.

ML tools¶

DSS includes:

Jupyter Notebooks: Open-source environment that provides a flexible interface to organise DS projects and ML workloads.
MLflow: Open-source platform for managing the ML life cycle, including experiment tracking and model registry.
ML frameworks: DSS comes by default with PyTorch and TensorFlow. Users can manually add other frameworks, depending on their needs and use cases.

Jupyter Notebooks¶

A Jupyter Notebook is essentially a Kubernetes deployment, also known as Pod, running a Docker image with Jupyter Lab and a dedicated ML framework, such as PyTorch or TensorFlow. For each Jupyter Notebook, DSS mounts a Hostpath directory-backed persistent volume to the data directory. All Jupyter Notebooks share the same persistent volume, allowing them to exchange data seamlessly. The full path to that persistent volume is /home/jovyan/shared.

MLflow¶

MLflow operates in local mode, meaning that metadata and artefacts are, by default, stored in a local directory.

This local directory is backed by a persistent volume, mounted to a Hostpath directory of the MLflow Pod. The persistent volume can be found in the directory /mlruns.

Orchestration¶

DSS requires a container orchestration solution. DSS relies on Canonical K8s, a lightweight Kubernetes distribution.

Therefore, Canonical K8s needs to be deployed before installing DSS on the host machine. It must be configured with local storage support to handle persistent volumes used by DSS.

GPU support¶

DSS can run with or without the use of GPUs. If needed, follow NVIDIA GPU Operator for deployment details.

DSS does not automatically install the tools and libraries required for running GPU workloads. It relies on Canonical K8s for the required operating-system drivers. It also depends on the chosen image, for example, CUDA when working with NVIDIA GPUs.

Caution

GPUs from other silicon vendors rather than NVIDIA can be configured. However, its functionality is not guaranteed.

Storage¶

DSS expects a default storage class in the Kubernetes deployment, which is used to persist Jupyter Notebooks and MLflow artefacts. In Canonical K8s, a local storage class should be configured to provision Kubernetes’ PersistentVolumeClaims (PVCs).

A shared PVC is used across all Jupyter Notebooks to share and persist data. MLflow also uses its dedicated PVC to store the logged artefacts. This is the DSS default storage configuration and cannot be altered.

This choice ensures that all storage is backed up on the host machine in the event of cluster restarts.

Note

By default, you can access the DSS storage anytime under your local directory /var/snap/k8s/common/default-storage.

The following diagram summarises the DSS storage:

Operating system¶

DSS is native on Ubuntu, being developed, tested, and validated on it. Moreover, the solution can be used on any Linux distribution.

Namespace configuration¶

DSS runs on a dedicated Kubernetes namespace. By default, it contains two Kubernetes Pods.

The NVIDIA GPU support runs on another dedicated namespace. This includes the GPU Operator for managing access and usage.

Accessibility¶

Jupyter Notebooks and MLflow can be accessed from a web browser through the Pod IP that is given access through Canonical K8s. See Access a notebook and Access MLflow for more details.