Deploy Autoscaling model serving¶
The Autoscaling model serving solution offers the ability to deploy KServe, Knative, and Istio charms on their own to serve Machine Learning (ML) models that can be accessed through ingress.
Requirements¶
Juju 2.9.49 or above.
A Kubernetes cluster with a configured
LoadBalancer
, DNS, and a storage class solution.
Deploy the solution¶
You can deploy the solution in the following ways:
Deploy with Terraform.
Deploy with charm bundle.
Regardless of the chosen deployment method, the following charm configuration is required:
juju config knative-serving istio.gateway.namespace="<Istio ingress gateway namespace>"
where the Istio ingress gateway namespace corresponds to the model name where the autoscaling-model-serving
bundle is deployed.
Deploy with Terraform¶
The Autoscaling model serving is defined with a Terraform module that facilitates its deployment using the Terraform Juju provider.
In its most basic form, the solution can be deployed as follows:
terraform apply -v
Refer to this code for more information about inputs and outputs of the module.
Deploy with charm bundle¶
Charm bundles are now obsolete,
but as part of v0.1, the bundle.yaml
file is still available.
To deploy:
Clone the
autoscaling-model-serving
repository.Deploy using the
bundle.yaml
file:
juju deploy ./bundle/bundle.yaml --trust
Perform inference¶
Apply an
InferenceService
.
Perform inference by making a request using the URL from the recently created
InferenceService
.
For example, by running:
kubectl get inferenceservices <name of the inferenceservice> -n <namespace where it is deployed>
You get the following output:
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
<name> http://<name>.<namespace>.<LoadBalancer IP.DNS> True 100
The http://<name>.<namespace>.<LoadBalancer IP.DNS>
can be used in any sort of request, for example:
curl -v -H "Content-Type: application/json" http://<name>.<namespace>.<LoadBalancer IP.DNS>/v1/models/<name>:predict -d @./some-input.json
Integrate with COS¶
You can integrate the solution with Canonical Observability Stack (COS) while deploying with the Terraform module by running:
terraform apply -var cos_configuration=true
If the solution was deployed using the charm bundle, or using the Terraform module without the COS options passed, see Integrate with COS for more details.