Install behind a web proxy

This guide describes how to install Charmed Kubeflow (CKF) behind a web proxy.

This guide assumes that you have already setup your underlying K8s cluster with the proxy settings. Please refer to your Kubernetes distribution documentation for more details on how this can be done for your particular choice of K8s distribution.

Prepare your environment

Before installing CKF, first you need to set up your client with the required proxy settings.

Configure snap

Save the value of your proxy server address for reuse:

PROXY=http://<username>:<password>@<proxy IP>:<proxy port>/

Note

Add the username:<password>@ part only if the proxy server is configured with credentials, check with your network administrator.

Set the snap proxy settings:

sudo snap set system proxy.http=$PROXY
sudo snap set system proxy.https=$PROXY

This will enable you to install snap packages.

Now restart the snap service:

sudo systemctl restart snapd.service

Configure Juju

Configure Juju with the proxy settings. Beside the http_proxy and https_proxy, it is also important to set the no_proxy environment variable to make sure that requests within Kubernetes and among the various K8s services are not sent through the proxy.

export http_proxy=$PROXY
export https_proxy=$PROXY
export no_proxy=$CLUSTER_CIDR,\
$SERVICE_CIDR,\
127.0.0.1,\
localhost,\
$NODE_IP/24,\
$HOSTNAME,\
.svc,\
.local,\
.kubeflow

Make sure to replace <hostname>, CLUSTER_CIDR, SERVICE_CIDR and NODE_IP with the settings for your environment. As an example, on a local deployment of MicroK8s you can use the following snippet:

export CLUSTER_CIDR=$(cat /var/snap/microk8s/current/args/kube-proxy | grep cluster-cidr | sed 's/^[^=]*=//')
export SERVICE_CIDR=$(cat /var/snap/microk8s/current/args/kube-apiserver | grep service-cluster-ip-range | sed 's/^[^=]*=//')
export NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
export HOSTNAME=$(hostname)
export NO_PROXY="$CLUSTER_CIDR,$SERVICE_CIDR,127.0.0.1,localhost,$NODE_IP/24,$HOSTNAME,.svc,.local,.kubeflow"

For more information on how to set variables on MicroK8s, please refer to this guide. For other K8s distribution, please check their respective product documentation to find out how to retrieve these information.

Once the proxy environment variables are correctly set, install Juju:

sudo snap install juju --classic --channel=3.6/stable

Create a Juju controller in your cluster and set the proxy model default values.

juju bootstrap microk8s micro --model-default juju-http-proxy=$http_proxy \
--model-default juju-https-proxy=$https_proxy \
--model-default juju-no-proxy=$no_proxy

When using a controller that has already been bootstrapped, the proxy settings can also be set afterwards using juju model-config, e.g.

juju switch controller
juju model-config juju-http-proxy=$http_proxy juju-https-proxy=$https_proxy juju-no-proxy=$no_proxy

Deploy CKF

To deploy CKF, follow the steps provided in the general installation guide.

When deploying with Terraform, make sure you are providing the proxy input variables, e.g. using a tfvars.json file

{
     "http_proxy": "<http_proxy>",
     "https_proxy": "<https_proxy>",
     "no_proxy": "<no_proxy>"
}

If you are re-using an existing model, make sure the kubeflow model has your proxy settings, run:

juju model-config

You should see the proxy settings in the juju-http-proxy, juju-https-proxy and juju-no-proxy variables. If these are not set, proceed to set them to their correct values using the juju model-config <key>=<value> command, as shown above.

Use Kubeflow components behind a proxy

The following sections provides information on how to set and manage the proxy values for the various kind of Kubeflow user workloads. However, note that how to set the proxy values may depend on the specific type of user workload and on their requirement to reach external resources. In particular, some python libraries may not handle the no_proxy environment variable correctly or they may not work with IP ranges. In this specific cases, we recommend you to disable proxy to ensure that internal requests are not sent and blocked by the proxy.

Notebooks

Apply the following PodDefault to your user namespace so each notebook you create will have proxy configurations set. The NO_PROXY and no_proxy values would be the same as you configured in the Juju model.

cat <<EOF | kubectl apply -n $USER_NAMESPACE -f -
apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
     name: notebook-proxy
spec:
     desc: Add proxy settings
     env:
         - name: HTTP_PROXY
         value: http://10.0.1.119:3128/ # replace with $PROXY
         - name: http_proxy
         value: http://10.0.1.119:3128/ # replace with $PROXY
         - name: HTTPS_PROXY
         value: http://10.0.1.119:3128/ # replace with $PROXY
         - name: https_proxy
         value: http://10.0.1.119:3128/ # replace with $PROXY
         - name: NO_PROXY
         value: <cluster cidr>,<service cluster ip range>,127.0.0.1,<nodes internal ip(s)>/24,<cluster hostname>,.svc,.local
         - name: no_proxy
     value: <cluster cidr>,<service cluster ip range>,127.0.0.1,<nodes internal ip(s)>/24,<cluster hostname>,.svc,.local,.kubeflow
     selector:
     matchLabels:
          notebook-proxy: "true"
EOF

You should now be able to see Add proxy settings when creating a new notebook under Advanced Options > Configurations. Always select that option.

https://assets.ubuntu.com/v1/49809832-web-proxy-ckf.png

Katib

Before running a Katib experiment, add your proxy environment variables to your experiment definition for each container under spec.trialTemplate.trialSpec.spec.template.spec.containers:

env:
   - name: HTTP_PROXY
   value: http://10.0.1.119:3128/ # replace with $PROXY
   - name: http_proxy
   value: http://10.0.1.119:3128/ # replace with $PROXY
   - name: HTTPS_PROXY
   value: http://10.0.1.119:3128/ # replace with $PROXY
   - name: https_proxy
   value: http://10.0.1.119:3128/ # replace with $PROXY

See here a full Katib experiment example:

apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
name: grid-proxy
spec:
objective:
   type: maximize
   goal: 0.99
   objectiveMetricName: Validation-accuracy
   additionalMetricNames:
      - Train-accuracy
algorithm:
   algorithmName: grid
parallelTrialCount: 1
maxTrialCount: 1
maxFailedTrialCount: 1
parameters:
   - name: lr
      parameterType: double
      feasibleSpace:
      min: "0.001"
      max: "0.01"
      step: "0.001"
   - name: num-layers
      parameterType: int
      feasibleSpace:
      min: "2"
      max: "5"
   - name: optimizer
      parameterType: categorical
      feasibleSpace:
      list:
         - sgd
         - adam
         - ftrl
trialTemplate:
   primaryContainerName: training-container
   trialParameters:
      - name: learningRate
      description: Learning rate for the training model
      reference: lr
      - name: numberLayers
      description: Number of training model layers
      reference: num-layers
      - name: optimizer
      description: Training model optimizer (sdg, adam or ftrl)
      reference: optimizer
   trialSpec:
      apiVersion: batch/v1
      kind: Job
      spec:
      template:
         metadata:
            annotations:
            sidecar.istio.io/inject: "false"
         spec:
            containers:
            - name: training-container
               image: docker.io/kubeflowkatib/mxnet-mnist:latest
               command:
                  - "python3"
                  - "/opt/mxnet-mnist/mnist.py"
                  - "--batch-size=64"
                  - "--lr=${trialParameters.learningRate}"
                  - "--num-layers=${trialParameters.numberLayers}"
                  - "--optimizer=${trialParameters.optimizer}"
               env:
                  - name: HTTP_PROXY
                  value: http://10.0.1.119:3128/
                  - name: http_proxy
                  value: http://10.0.1.119:3128/
                  - name: HTTPS_PROXY
                  value: http://10.0.1.119:3128/
                  - name: https_proxy
                  value: http://10.0.1.119:3128/
            restartPolicy: Never

Pipelines

If your pipeline needs to download data or pull an image, you can inject your proxy environment variables into a pipeline from inside a notebook with the KFP SDK as done in this example notebook.

Istio

If needed, configure proxy settings for Istio as follows:

kubectl apply -n kubeflow -f - <<EOF
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: proxy
spec:
  hosts:
  - my-company-proxy.com # ignored
  addresses:
  - 10.0.1.119/32 # replace with proxy IP
  ports:
  - number: 3128 # replace with proxy port
    name: tcp
    protocol: TCP
  location: MESH_EXTERNAL
EOF