Deploying TensorFlow model as Canary Releases Kubernetes and Istio
In this lab, we will use Istio with Minikube and TensorFlow Serving to create canary deployments of TensorFlow machine learning models. More concretely, we will:
- Prepare a minikube cluster with the Istio add-on for TensorFlow Serving.
- Create a canary release of a TensorFlow model deployment.
- Configure various traffic splitting strategies.
Suppose we have already with minikube cluster look like that below:
$ minikube profile list
|----------|-----------|---------|-------------|------|---------|---------|-------|--------|
| Profile | VM Driver | Runtime | IP | Port | Version | Status | Nodes | Active |
|----------|-----------|---------|-------------|------|---------|---------|-------|--------|
| minikube | none | docker | 192.168.0.5 | 8443 | v1.24.3 | Running | 1 | * |
|----------|-----------|---------|-------------|------|---------|---------|-------|--------|
Let’s download istio from https://github.com/istio/istio/releases/ and extracting:
$ wget https://github.com/istio/istio/releases/download/1.12.7/istio-1.12.7-linux-amd64.tar.gz
$ tar zvxf istio-1.12.7-linux-amd64.tar.gz
$ cd istio-1.12.7 && ls
bin LICENSE manifests manifest.yaml README.md samples tools
$ export PATH=$PWD/bin:$PATH
$ istioctl install
This will install the Istio 1.12.7 default profile with ["Istio core" "Istiod" "Ingress gateways"] components into the cluster. Proceed? (y/N) y
✔ Istio core installed
✔ Istiod installed- Processing resources for Ingress gateways. Waiting for Deployment/istio-system/istio-ingressgateway
✔ Ingress gateways installed
✔ Installation complete
Making this installation the default for injection and validation.
Now listing pods in istio-system
namespace
$ kubectl get pods -n istio-system
NAME READY STATUS RESTARTS AGE
istio-ingressgateway-58fbb84dfd-d79rj 1/1 Running 0 15m
istiod-77f69cccd6-t9grw 1/1 Running 0 16m
$ kubectl get svc -n istio-system istio-ingressgateway
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 10.110.24.130 <pending> 15021:30487/TCP,80:30717/TCP,443:32359/TCP 24h
Because my environment does not provide an external load balancer for the ingress gateway, so we see the EXTERNAL-IP
have <pending>
status. Let’s use minikube tunnel
to fix this.
$ minikube tunnel
Status:
machine: minikube
pid: 24042
route: 10.96.0.0/12 -> 192.168.0.5
minikube: Running
services: [istio-ingressgateway]
errors:
minikube: no errors
router: no errors
loadbalancer emulator: no errors
$ kubectl get svc -n istio-system istio-ingressgateway
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 10.110.24.130 10.110.24.130 15021:30487/TCP,80:30717/TCP,443:32359/TCP 25h
Create the tf-serving
namespace and enable automatic proxy sidecar injection.
$ kubectl create ns tf-serving
namespace/tf-serving created
$ kubectl label ns tf-serving istio-injection=enabled
namespace/tf-serving labeled
$ kubectl get ns --show-labels
NAME STATUS AGE LABELS
default Active 13d kubernetes.io/metadata.name=default
istio-system Active 20m kubernetes.io/metadata.name=istio-system
kube-flannel Active 13d kubernetes.io/metadata.name=kube-flannel,pod-security.kubernetes.io/enforce=privileged
kube-node-lease Active 13d kubernetes.io/metadata.name=kube-node-lease
kube-public Active 13d kubernetes.io/metadata.name=kube-public
kube-system Active 13d kubernetes.io/metadata.name=kube-system
tf-serving Active 57s istio-injection=enabled,kubernetes.io/metadata.name=tf-serving
Download ResNet50 pretrained model using python code:
>>> import tensorflow as tf
>>> print(tf.__version__)
2.6.2
>>> model = tf.keras.applications.resnet50.ResNet50()
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels.h5
102973440/102967424 [==============================] - 32s 0us/step
>>> model.save('/home/dangho/Downloads/resnet50/1') # 1 is version
INFO:tensorflow:Assets written to: /home/dangho/Downloads/resnet50/1/assets
So we have the ResNet50 pretrained model:
$ ls -la resnet50/1/
total 4076
drwxrwxrwx 1 dangho dangho 4096 Thg 9 24 23:08 .
drwxrwxrwx 1 dangho dangho 4096 Thg 9 24 22:56 ..
drwxrwxrwx 1 dangho dangho 0 Thg 9 24 23:08 assets
-rwxrwxrwx 1 dangho dangho 365902 Thg 9 24 23:08 keras_metadata.pb
-rwxrwxrwx 1 dangho dangho 3796233 Thg 9 24 23:08 saved_model.pb
drwxrwxrwx 1 dangho dangho 0 Thg 9 24 23:08 variables
Creating ConfigMap configmap-resnet50.yaml
:
apiVersion: v1
kind: ConfigMap
metadata:
name: resnet50-configs
namespace: tf-serving
data:
MODEL_NAME: image_classifier
MODEL_PATH: /models/resnet50
Using kubectl
create the ConfigMap:
$ kubectl apply -f configmap-resnet50.yaml
configmap/resnet50-configs created
Creating the deployment deployment-resnet50.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: image-classifier-resnet50
namespace: tf-serving
labels:
app: image-classifier
version: resnet50
spec:
replicas: 1
selector:
matchLabels:
app: image-classifier
version: resnet50
template:
metadata:
labels:
app: image-classifier
version: resnet50
spec:
containers:
- name: tf-serving
image: "tensorflow/serving:2.5.1"
args:
- "--model_name=$(MODEL_NAME)"
- "--model_base_path=$(MODEL_PATH)"
envFrom:
- configMapRef:
name: resnet50-configs
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8501
protocol: TCP
- name: grpc
containerPort: 8500
protocol: TCP
volumeMounts:
- name: model
mountPath: /models/resnet50
volumes:
- name: model
hostPath:
path: /home/dangho/Downloads/resnet50
Create the deployment and inspect it:
$ kubectl apply -f deployment-resnet50.yaml
deployment.apps/image-classifier-resnet50 created
$ kubectl get deployments.apps -n tf-serving
NAME READY UP-TO-DATE AVAILABLE AGE
image-classifier-resnet50 1/1 1 1 91s
$ kubectl get pods -n tf-serving
NAME READY STATUS RESTARTS AGE
image-classifier-resnet50-8556494d8-q4qm7 2/2 Running 0 5s
Notice that the deployment is annotated with two labels: app: image-classifier
and version: resnet50
. These labels will be the key when configuring Istio traffic routing.
Creating the service manifest in service.yaml
:
apiVersion: v1
kind: Service
metadata:
name: image-classifier
namespace: tf-serving
labels:
app: image-classifier
service: image-classifier
spec:
type: ClusterIP
ports:
- port: 8500
protocol: TCP
name: tf-serving-grpc
- port: 8501
protocol: TCP
name: tf-serving-http
selector:
app: image-classifier
The selector field refers to the app: image-classifier
label. What it means is that the service will load balance across all pods annotated with this label. At this point these are the pods comprising the ResNet50 deployment. The service type is ClusterIP
. The IP address exposed by the service is only visible within the cluster.
Create the service and inspect it:
$ kubectl apply -f service.yaml
service/image-classifier created
$ kubectl get svc -n tf-serving
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
image-classifier ClusterIP 10.96.44.45 <none> 8500/TCP,8501/TCP 12s
Now we use an Istio Ingress gateway to manage inbound and outbound traffic for your mesh.
Firstly, configure the gateway to open port 80
for the HTTP
traffic:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: image-classifier-gateway
namespace: tf-serving
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
$ kubectl apply -f gateway.yaml
gateway.networking.istio.io/image-classifier-gateway created
$ kubectl get gateways.networking.istio.io -n tf-serving
NAME AGE
image-classifier-gateway 1s
At this point the gateway is running but your ResNet50 deployment is still not accessible from the outside of the cluster. We need to configure a virtual service.
Virtual services
, along with destination rules are the key building blocks of Istio’s traffic routing functionality. A virtual service lets you configure how requests are routed to a service within an Istio service mesh. Each virtual service consists of a set of routing rules that are evaluated in order, letting Istio match each given request to the virtual service to a specific real destination within the mesh.
We will start by configuring a virtual service that forwards all requests sent through the gateway on port 80
to the image-classifier
service on port 8501
.
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: image-classifier
namespace: tf-serving
spec:
hosts:
- "*"
gateways:
- image-classifier-gateway
http:
- route:
- destination:
host: image-classifier
port:
number: 8501
Recall that the image-classifier
service is configured to load balance between pods annotated with the app: image-classifer
label. As a result all requests sent to the gateway will be forwarded to the ResNet50 deployment.
$ kubectl apply -f virtual-service.yaml
virtualservice.networking.istio.io/image-classifier created
Now we will test access to the ResNet50 model.
$ kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
10.110.24.130
$ kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}'
80
Now we will generate the inputs of ResNet50 model from image:
$ python3 gen_payload.py
You will see the request-body.json
file have created. Using curl
or requests module in python3 command to send request to serving model.
$ curl -d @request-body.json -X POST http://10.110.24.130:80/v1/models/image_classifier/versions/1:predict
{
"predictions": [[
...
]
]
}
>>> import requests, json, numpy
>>> r = requests.post(url="http://10.110.24.130:80/v1/models/image_classifier/versions/1:predict",
json=json.load(open("request-body.json")))
>>> res = r.json()
>>> preds = res['predictions']
>>> pred = preds[0]
>>> print(len(pred))
1000 # Because the classes is 1000
>>> print((-numpy.array(pred)).argsort()[:10]) # Get top-10 max
[667 457 906 610 834 796 841 652 620 465]
Detailed class list is listed here https://deeplearning.cms.waikato.ac.nz/user-guide/class-maps/IMAGENET/. From the prediction, we can get the label:
667: mortarboard
457: bow tie, bow-tie, bowtie
906: Windsor tie
610: jersey, T-shirt, tee shirt
834: suit, suit of clothes
796: ski mask
841: sweatshirt
652: military uniform
620: laptop, laptop computer
465: bulletproof vest
We have deployed the ResNet50, now will deploy another model, it’s the ResNet101 model as a canary release. As in the case of the ResNet50 model, ResNet101 will be deployed as a Kubernetes Deployment annotated with the app
label set to image-classifier
. Recall that the image-classifier
Kubernetes service is configured to load balance between pods that contain the app
label set to this value and that the Istio virtual service is configured to forward all the traffic from the Istio Ingress gateway to the image-classifier
service.
If you deployed ResNet101 without any changes to the Istio virtual service the incoming traffic would be load balanced across pods from both deployments using a round-robin
strategy. To have a better control over which requests go to which deployment you need to configure a destination rule
and modify the virtual service.
In this case, the destination rule configures two different subsets of the image-classifier
service using the version label as a selector, in order to differentiate between ResNet50 and ResNet101 deployments:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: image-classifier
namespace: tf-serving
spec:
host: image-classifier
subsets:
- name: resnet101
labels:
version: resnet101
- name: resnet50
labels:
version: resnet50
To create the destination rule:
$ kubectl apply -f destinationrule.yaml
destinationrule.networking.istio.io/image-classifier created
Recall that the ResNet50 deployment is annotated with two labels: app: image-classifier and version: resnet50. The ResNet101 deployment is also annotated with the same labels. The value of the app label is the same but the value of the version label is different:
apiVersion: apps/v1
kind: Deployment
metadata:
name: image-classifier-resnet101
namespace: tf-serving
labels:
app: image-classifier
version: resnet101
spec:
replicas: 1
selector:
matchLabels:
app: image-classifier
version: resnet101
template:
metadata:
labels:
app: image-classifier
version: resnet101
spec:
containers:
- name: tf-serving
image: "tensorflow/serving:2.5.1"
args:
- "--model_name=$(MODEL_NAME)"
- "--model_base_path=$(MODEL_PATH)"
envFrom:
- configMapRef:
name: resnet101-configs
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8501
protocol: TCP
- name: grpc
containerPort: 8500
protocol: TCP
volumeMounts:
- name: model
mountPath: /models/resnet101
volumes:
- name: model
hostPath:
path: /home/dangho/Downloads/resnet101
The final step is to reconfigure the virtual service to use the destination rule.
You will start by modifying the virtual service to route all requests (100%) from the Ingress gateway to the resnet50 service subset:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: image-classifier
namespace: tf-serving
spec:
hosts:
- "*"
gateways:
- image-classifier-gateway
http:
- route:
- destination:
host: image-classifier
subset: resnet50
port:
number: 8501
weight: 100
- destination:
host: image-classifier
subset: resnet101
port:
number: 8501
weight: 0
To apply the changes:
$ kubectl apply -f virtualservice-weight-100.yaml
virtualservice.networking.istio.io/image-classifier configured
Now deploy the ResNet101 model using the same process as for the ResNet50 model. Refer back to the steps above to the kubectl
commands for applying updates to the configmaps and deployment configurations.
But we need to download the ResNet101 pretrained model using python code:
>>> import tensorflow as tf
>>> print(tf.__version__)
2.6.2
>>> model = tf.keras.applications.resnet.ResNet101()
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet101_weights_tf_dim_ordering_tf_kernels.h5
179650560/179648224 [==============================] - 26s 0us/step
>>> model.save('/home/dangho/Downloads/resnet101/1') # 1 is version
INFO:tensorflow:Assets written to: /home/dangho/Downloads/resnet101/1/assets
Apply the manifest:
$ kubectl apply -f configmap-resnet101.yaml
configmap/resnet101-configs created
$ kubectl apply -f deployment-resnet101.yaml
deployment.apps/image-classifier-resnet101 created
$ kubectl get deployments.apps -n tf-serving
NAME READY UP-TO-DATE AVAILABLE AGE
image-classifier-resnet101 1/1 1 1 41s
image-classifier-resnet50 1/1 1 1 51s
At this point the ResNet101 deployment is ready but the virtual service is configured to route all requests to ResNet50.
Verify this by sending a few prediction requests:
>>> import requests, json, numpy
>>> pred = requests.post(url="http://10.110.24.130:80/v1/models/image_classifier/versions/1:predict",
json=json.load(open("request-body.json"))).json()['predictions'][0]
>>> print((-numpy.array(pred)).argsort()[:10])
[667 457 906 610 834 796 841 652 620 465]
All requests return the same result.
You will now reconfigure the Istio virtual service to split the traffic between the ResNet50 and ResNet101 models using weighted load balancing - 70% requests will go to ResNet50 and 30% to ResNet101.
The manifest for the new configuration is in the virtualservice-weight-70.yaml
file:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: image-classifier
namespace: tf-serving
spec:
hosts:
- "*"
gateways:
- image-classifier-gateway
http:
- route:
- destination:
host: image-classifier
subset: resnet50
port:
number: 8501
weight: 70
- destination:
host: image-classifier
subset: resnet101
port:
number: 8501
weight: 30
To apply the manifest:
$ kubectl apply -f virtualservice-weight-70.yaml
virtualservice.networking.istio.io/image-classifier configured
Send a few more requests - more than 10:
>>> import requests, json, numpy
>>> pred = requests.post(url="http://10.110.24.130:80/v1/models/image_classifier/versions/1:predict",
json=json.load(open("request-body.json"))).json()['predictions'][0]
>>> print((-numpy.array(pred)).argsort()[:10])
[667 457 906 610 834 796 841 652 620 465]
>>> pred = requests.post(url="http://10.110.24.130:80/v1/models/image_classifier/versions/1:predict",
json=json.load(open("request-body.json"))).json()['predictions'][0]
>>> print((-numpy.array(pred)).argsort()[:10])
[906 667 834 400 457 652 451 465 841 917]
Notice that responses are now different.
As an optional task, you will reconfigure the virtual service to route traffic to the canary deployment based on request headers. This approach allows a variety of scenarios, including routing requests from a specific group of users to the canary release. Assume that the request from the canary users will carry a custom header user-group. If this header is set to canary the requests will be routed to ResNet101.
The virtualservice-focused-routing.yaml
manifest defines this configuration:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: image-classifier
namespace: tf-serving
spec:
hosts:
- "*"
gateways:
- image-classifier-gateway
http:
- match:
- headers:
user-group:
exact: canary
route:
- destination:
host: image-classifier
subset: resnet101
port:
number: 8501
- route:
- destination:
host: image-classifier
subset: resnet50
port:
number: 8501
The match field defines a matching pattern and the route that is used for the requests with the matching header.
Reconfigure the virtual service:
$ kubectl apply -f virtualservice-focused-routing.yaml
virtualservice.networking.istio.io/image-classifier configured
Send a few requests without the user-group
header:
>>> import requests, json, numpy
>>> pred = requests.post(url="http://10.110.24.130:80/v1/models/image_classifier/versions/1:predict",
json=json.load(open("request-body.json"))).json()['predictions'][0]
>>> print((-numpy.array(pred)).argsort()[:10])
[667 457 906 610 834 796 841 652 620 465]
Notice that all of the responses are coming from the ResNet50 model.
Now, send a few requests with the user-group
header set to canary
:
>>> import requests, json, numpy
>>> pred = requests.post(url="http://10.110.24.130:80/v1/models/image_classifier/versions/1:predict",
json=json.load(open("request-body.json")),
headers={"user-group": "canary"}).json()['predictions'][0]
>>> print((-numpy.array(pred)).argsort()[:10])
[906 667 834 400 457 652 451 465 841 917]
All of the responses are now sent by the ResNet101 model.
This deployment configuration can be used for A/B testing as you can easily monitor the performance of the canary model on a specific user group.
You can get full source code here!