Install allowing for advanced node-pool scheduling¶
Note
This guide can be combined with other installation methods described in Install. In such a case, it is recommended to go through both this guide and the chosen installation method first, to then understand how to interleave the steps of this guide to the ones of the chosen installation method, which may also contain its own specific cluster-setup and deployment instructions.
This guide describes how to set up your K8s cluster and how to install Charmed Kubeflow (CKF) to allow for the most advanced node-pool scheduling possible, so that:
each Kubeflow Profile has a configurable node pool where respective user workloads will be scheduled to by default
each user workload can be selectively scheduled to different node pools than the default one of the respective Kubeflow Profile, among the node pools allocated to user workloads
the following are mutually segregated to different node pools:
K8s-control-plane workloads
(optionally) Juju-system workloads
CKF-platform workloads
CKF-user workloads
(optionally) different CKF-platform workloads are in turn selectively scheduled to different node pools, among the ones allocated to CKF platform workloads
Requirements¶
Follow the same requirements as in the desired installation method among the other ones listed in Install.
Procedure¶
Take the following steps in the same order.
Step 1: label and taint your node pools¶
Set up your K8s cluster while labeling and tainting your node pools in one of two ways, either:
Unsegregated Juju system and pools for general workloads, or
Segregated Juju system and no pools for general workloads
Given that Juju-system workloads do not support specifying node affinities or tolerations, they will be randomly scheduled to any untainted node pools. Follow the latter approach when you prefer to segregate Juju-system workloads to a specific node pool, and the former when you prefer to use untainted node pools also for general workloads.
Warning
It is assumed that the described node-pool setup, with nodes appropriately labeled and tainted, is either already in place or independently achievable, without expecting the described process to handle the migration and/or rescheduling of user workloads from previous clusters and/or cluster states that may differ in terms of node pools.
Option 1: unsegregated Juju system and pools for general workloads¶
Label and taint node pools this way:
It is implicitly assumed that, if the K8s cluster is set up with labels and taints for the node pool of the K8s control plane, the workloads of the K8s control plane are already deployed with the respective node affinity and tolerations for their node pool.
For the node pool(s) of the Kubeflow platform:
When not scheduling different CKF-platform workloads to different node pools: add one specific label and one specific taint that do not conflict with any default ones. An example could be platform=kubeflow for the label and
platform=kubeflow:NoSchedulefor the taint.When scheduling different CKF-platform workloads to different node pools: add one specific, different label for each such node pool and one same, specific taint for all such node pools, with all labels and taints not conflicting with any default ones. An example could be
kubeflow-platform-arch=ifor the label andplatform=kubeflow:NoSchedulefor the taint of a node pool andkubeflow-platform-arch=jfor the label andplatform=kubeflow:NoSchedulefor the taint of another node pool.
For each node pool meant to be used as the default one of some Kubeflow Profile(s), add one specific label - different for each node pool - that does not conflict with any default ones. An example could be
kubeflow-default-node-pool=afor one of the default node pools andkubeflow-default-node-pool=bfor another default one.Note
These labels do not need to correspond to Kubeflow Profiles, but rather (zero, one or more) Kubeflow Profiles will later use these (and possibly the same) labels for the default node affinity of their workloads.
For each node pool (with special hardware) meant to be used by workloads only when explicitly overriding the default node pool-allocation of their respective Kubeflow Profile(s), add one specific label and one specific taint - different for each node pool - that do not conflict with any default ones. An example could be
special-hardware=xfor the label andspecial-hardware=x:NoSchedulefor the taint for one of the special-hardware node pools, andspecial-hardware=yfor the label andspecial-hardware=y:NoSchedulefor the taint for another special-hardware one.Feel free to keep (zero, one or more) node pools unlabeled and untainted, for general use.
Warning
Make sure that all applied taints are of type NoSchedule, and not NoExecute, in order not to disrupt pre-existing cluster workloads in case of incorrect initial cluster settings, expectations and/or assumptions.
Option 2: segregated Juju system and no pools for general workloads¶
Label and taint node pools as described above for option 1 but with the following changes:
Keep one and only node pool unlabeled and untainted, meant to be used for Juju-system workloads only and without foreseeing any general workloads ending up in the same pool.
For each node pool meant to be used as the default one of some Kubeflow Profile(s), add both specific label(s) and specific taint(s).
Step 2: set up your Juju controller¶
For bootstrapping instructions, see Get started with Juju. No additional, specific precautions are required.
Step 3 (Optional): set up a temporary namespace-node-affinity-operator¶
Note
This step is required only if it is desirable to have namespace-node-affinity-operator itself scheduled to Kubeflow-platform node pools instead of any untainted node pools such as Juju-system ones or (when available) general-workload ones.
Create a temporary Juju model and deploy namespace-node-affinity-operator into such a model, using a Juju application name different from the default one of the charm. For example:
juju add-model temp-namespace-node-affinity juju switch temp-namespace-node-affinity juju deploy --trust --channel 2.2/stable namespace-node-affinity temp-namespace-node-affinity juju wait-for application temp-namespace-node-affinity
Then, configure namespace-node-affinity-operator to inject workloads scheduled in the (not-yet-created) namespace of the Kubeflow platform with:
When not scheduling different CKF-platform workloads to different node pools: both node affinity and tolerations, respectively matching the label and the taint of the Kubeflow-platform node pool. An example may be:
namespace_node_affinity_settings=$(cat << EOF kubeflow: | excludedLabels: exclude-me-from-namespace-node-affinity-operator: "true" nodeSelectorTerms: - matchExpressions: - key: platform operator: In values: [kubeflow] tolerations: - effect: NoSchedule key: platform operator: Equal value: [kubeflow] EOF ) juju config temp-namespace-node-affinity settings_yaml="$namespace_node_affinity_settings"
When scheduling different CKF-platform workloads to different node pools: only tolerations, matching the taint of the Kubeflow-platform node pool. An example may be:
namespace_node_affinity_settings=$(cat << EOF kubeflow: | excludedLabels: exclude-me-from-namespace-node-affinity-operator: “true” tolerations: - effect: NoSchedule key: platform operator: Equal value: kubeflow EOF ) juju config temp-namespace-node-affinity settings_yaml="$namespace_node_affinity_settings"
Step 4: create the Kubeflow-platform Juju model¶
No additional, specific precautions required. For instance:
juju add-model kubeflow
Step 5: label the Kubeflow-platform model’s namespace(s)¶
Label with namespace-node-affinity=enabled the namespace of the Juju model for the Kubeflow platform. In case Knative is to be deployed, also label the namespaces of Knative, knative-eventing and knative-serving, in the same way, after manually creating them. For example:
kubectl label namespaces kubeflow namespace-node-affinity=enabled
kubectl create namespace knative-eventing
kubectl label namespaces knative-eventing namespace-node-affinity=enabled
kubectl create namespace knative-serving
kubectl label namespaces knative-serving namespace-node-affinity=enabled
Step 6: set up namespace-node-affinity-operator¶
Deploy (the primary instance of) namespace-node-affinity-operator into (a different model,) the model for the Kubeflow platform. If you deployed a temporary instance of the same operator following step 3 above, use a different Juju application name, despite being on different Juju models, to avoid conflicts among the underlying K8s resources. Then, configure it using the same configurations as for the former operator in step 3. In case Knative is to be deployed, replicate the same configurations of the standard Kubeflow-platform namespace for the namespaces of Knative. An example of such configurations may be:
When not scheduling different CKF-platform workloads to different node pools:
juju switch kubeflow juju deploy --trust --channel 2.2/stable namespace-node-affinity namespace-node-affinity juju wait-for application namespace-node-affinity namespace_node_affinity_settings=$(cat << EOF kubeflow: | excludedLabels: exclude-me-from-namespace-node-affinity-operator: "true" nodeSelectorTerms: - matchExpressions: - key: platform operator: In values: [kubeflow] tolerations: - effect: NoSchedule key: platform operator: Equal value: kubeflow knative-eventing: | excludedLabels: exclude-me-from-namespace-node-affinity-operator: "true" nodeSelectorTerms: - matchExpressions: - key: platform operator: In values: [kubeflow] tolerations: - effect: NoSchedule key: platform operator: Equal value: kubeflow knative-serving: | excludedLabels: exclude-me-from-namespace-node-affinity-operator: "true" nodeSelectorTerms: - matchExpressions: - key: platform operator: In values: [kubeflow] tolerations: - effect: NoSchedule key: platform operator: Equal value: kubeflow EOF ) juju config namespace-node-affinity settings_yaml="$namespace_node_affinity_settings"
When scheduling different CKF-platform workloads to different node pools:
juju switch kubeflow juju deploy --trust --channel 2.2/stable namespace-node-affinity namespace-node-affinity juju wait-for application namespace-node-affinity namespace_node_affinity_settings=$(cat << EOF kubeflow: | excludedLabels: exclude-me-from-namespace-node-affinity-operator: "true" tolerations: - effect: NoSchedule key: platform operator: Equal value: kubeflow knative-eventing: | excludedLabels: exclude-me-from-namespace-node-affinity-operator: "true" tolerations: - effect: NoSchedule key: platform operator: Equal value: kubeflow knative-serving: | excludedLabels: exclude-me-from-namespace-node-affinity-operator: "true" tolerations: - effect: NoSchedule key: platform operator: Equal value: kubeflow EOF ) juju config namespace-node-affinity settings_yaml="$namespace_node_affinity_settings"
Step 7 (Optional): tear down the former namespace-node-affinity-operator¶
Note
This step is required if and only if step 3 above was followed.
Delete the former namespace-node-affinity-operator, that is the one deployed in the temporary Juju model, not the one in the Juju model for the Kubeflow platform. For example:
juju switch temp-namespace-node-affinity
juju remove-application --destroy-storage --no-prompt temp-namespace-node-affinity
juju destroy-model --destroy-storage --no-prompt temp-namespace-node-affinity
juju switch kubeflow
Step 8: deploy CKF¶
Deploy CKF following any installation method among the supported ones and by also:
When not scheduling different CKF-platform workloads to different node pools:
No additional, specific precautions required.
When scheduling different CKF-platform workloads to different node pools:
Deploying Juju applications with Juju constraints’ tags defining specific node affinities as exemplified in here, one for each application to target the desired node pool (with the respective node label(s)) among the Kubeflow-platform ones. Here is an example:
--constraints="tags=node.kubeflow-platform-arch=j".Deploying Juju applications that operate additional platform workloads (in addition to the ones defined in metadata.yaml) with charm configurations that add to such workloads node affinities as in the point above (not necessarily the same ones as the respective charms, at the user’s own discretion).
Note
Make sure that a charm revision of kubeflow-profiles greater than or equal to 839 is deployed, to include the changes required for it to label user Profiles’ namespaces to enable namespace-node-affinity-operator.
Step 9: (re)configure your Kubeflow Profiles’ default node pools¶
For each Profile’s namespace, add configurations to allow for some label(s) to disable the default injection by namespace-node-affinity-operator, to also be able to schedule specific workflows to different node pools than the default one — i.e. the “Customization” objective in Abstract. An example of such a label could be exclude-me-from-namespace-node-affinity-operator=”true”.
Note
Profiles whose namespaces are not configured with exclusion labels will not be able to override the default node-pool scheduling, therefore opting out of such a feature.
Moreover, among the configurations of namespace-node-affinity-operator, add new configuration sections for (not-yet-created) Profiles’ namespaces so that Profiles’ workloads are scheduled to respective default node pools, in particular: * node affinity matching labels of default node pools * tolerations matching taints of default node pools, only when segregating Juju-system workloads
See the following examples.
Note
CKF admins can continuously reconfigure the default node-pool allocation and whether to enable customization for any Profiles. Nevertheless, Profile workloads deployed before the desired configuration changes are not expected to be rescheduled or migrated. For this reason, it is recommended to configure Profiles before their actual creation (knowing namespace names correspond to Profile names).
Option 1: unsegregated Juju system and pools for general workloads¶
An example to set overall configurations, where both Profiles profile-i and profile-j have the node pool labeled with kubeflow-default-node-pool=a as their default one and Profile profile-k has the node pool labeled with kubeflow-default-node-pool=b as its default one, may be:
Example
namespace_node_affinity_settings=$(cat << EOF
kubeflow: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: platform
operator: In
values: [kubeflow]
tolerations:
- effect: NoSchedule
key: platform
operator: Equal
value: kubeflow
knative-eventing: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: platform
operator: In
values: [kubeflow]
tolerations:
- effect: NoSchedule
key: platform
operator: Equal
value: kubeflow
knative-serving: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: platform
operator: In
values: [kubeflow]
tolerations:
- effect: NoSchedule
key: platform
operator: Equal
value: kubeflow
profile-i: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: kubeflow-default-node-pool
operator: In
values:
- a
profile-j: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: kubeflow-default-node-pool
operator: In
values:
- a
profile-k: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: kubeflow-default-node-pool
operator: In
values:
- b
EOF
)
juju config namespace-node-affinity settings_yaml="$namespace_node_affinity_settings"
Note
Profiles whose namespaces are not configured with affinities for default node pools will see their workloads randomly scheduled in any node pools without taints, including not only all the default ones but also any other general ones that may exist, therefore opting out of such a feature.
Option 2: segregated Juju system and no pools for general workloads¶
An example to set overall configurations, where both Profiles profile-i and profile-j have the node pool labeled with kubeflow-default-node-pool=a and tainted with kubeflow-default-node-pool=a:NoSchedule as their default one and Profile profile-k has the node pool labeled with kubeflow-default-node-pool=b and tainted with kubeflow-default-node-pool=b:NoSchedule as its default one, may be:
Example
namespace_node_affinity_settings=$(cat << EOF
kubeflow: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: platform
operator: In
values: [kubeflow]
tolerations:
- effect: NoSchedule
key: platform
operator: Equal
value: kubeflow
knative-eventing: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: platform
operator: In
values: [kubeflow]
tolerations:
- effect: NoSchedule
key: platform
operator: Equal
value: kubeflow
knative-serving: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: platform
operator: In
values: [kubeflow]
tolerations:
- effect: NoSchedule
key: platform
operator: Equal
value: kubeflow
profile-i: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: kubeflow-default-node-pool
operator: In
values:
- a
tolerations:
- effect: NoSchedule
key: kubeflow-default-node-pool
operator: Equal
value: a
profile-j: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: kubeflow-default-node-pool
operator: In
values:
- a
tolerations:
- effect: NoSchedule
key: kubeflow-default-node-pool
operator: Equal
value: a
profile-k: |
excludedLabels:
exclude-me-from-namespace-node-affinity-operator: "true"
nodeSelectorTerms:
- matchExpressions:
- key: kubeflow-default-node-pool
operator: In
values:
- b
tolerations:
- effect: NoSchedule
key: kubeflow-default-node-pool
operator: Equal
value: b
EOF
)
juju config namespace-node-affinity settings_yaml="$namespace_node_affinity_settings"
Note
Profiles whose namespaces are not configured with affinities and tolerations for default node pools will see their workloads scheduled to the node pool for Juju-system workloads.
Step 10: create some (other) Kubeflow Profiles¶
No additional, specific precautions required.
See how to create a Profile for general instructions.
For instance, coherently with the examples used above, such Profiles could be created this way:
for profile_name in profile-i profile-j profile-k;
do
kubectl apply -f - << EOF
apiVersion: kubeflow.org/v1
kind: Profile
metadata:
name: $profile_name
spec:
owner:
kind: User
name: admin@example.com
EOF
done
What’s next¶
You can finally deploy some (other) Profiles’ workloads by following configure workloads for the most advanced node-pool scheduling possible.