Deploy a model as a web service on an FPGA with Azure Machine Learning service
You can deploy a model as a web service on field programmable gate arrays (FPGAs). Using FPGAs provides ultra-low latency inferencing, even with a single batch size. These models are currently available:
- ResNet 50
- ResNet 152
If you don’t have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning service today.
An Azure Machine Learning service workspace and the Azure Machine Learning SDK for Python installed. Learn how to get these prerequisites using the How to configure a development environment document.
Your workspace needs to be in the East US 2 region.
Install the contrib extras:
pip install --upgrade azureml-sdk[contrib]
Currently only tensorflow version<=1.10 is supported, so install it after all other installations are complete:
pip install "tensorflow==1.10"
Get the notebook
For your convenience, this tutorial is available as a Jupyter notebook. Follow the code here or run the quickstart notebook.
Create and deploy your model
Create a pipeline to preprocess the input image, featurize it using ResNet 50 on an FPGA, and then run the features through a classifier trained on the ImageNet data set.
Follow the instructions to:
- Define the model pipeline
- Deploy the model
- Consume the deployed model
- Delete deployed services
To optimize latency and throughput, your client should be in the same Azure region as the endpoint. Currently the APIs are created in the East US Azure region.
The first stage of the pipeline is to preprocess the images.
import os import tensorflow as tf # Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings import azureml.contrib.brainwave.models.utils as utils in_images = tf.placeholder(tf.string) image_tensors = utils.preprocess_array(in_images) print(image_tensors.shape)
Initialize the model and download a TensorFlow checkpoint of the quantized version of ResNet50 to be used as a featurizer.
from azureml.contrib.brainwave.models import QuantizedResnet50 model_path = os.path.expanduser('~/models') model = QuantizedResnet50(model_path, is_frozen = True) feature_tensor = model.import_graph_def(image_tensors) print(model.version) print(feature_tensor.name) print(feature_tensor.shape)
This classifier has been trained on the ImageNet data set.
classifier_output = model.get_default_classifier(feature_tensor)
Create service definition
Now that you have defined the image preprocessing, featurizer, and classifier that runs on the service, you can create a service definition. The service definition is a set of files generated from the model that is deployed to the FPGA service. The service definition consists of a pipeline. The pipeline is a series of stages that are run in order. TensorFlow stages, Keras stages, and BrainWave stages are supported. The stages are run in order on the service, with the output of each stage becoming the input into the subsequent stage.
To create a TensorFlow stage, specify a session containing the graph (in this case default graph is used) and the input and output tensors to this stage. This information is used to save the graph so that it can be run on the service.
from azureml.contrib.brainwave.pipeline import ModelDefinition, TensorflowStage, BrainWaveStage save_path = os.path.expanduser('~/models/save') model_def_path = os.path.join(save_path, 'model_def.zip') model_def = ModelDefinition() with tf.Session() as sess: model_def.pipeline.append(TensorflowStage(sess, in_images, image_tensors)) model_def.pipeline.append(BrainWaveStage(sess, model)) model_def.pipeline.append(TensorflowStage(sess, feature_tensor, classifier_output)) model_def.save(model_def_path) print(model_def_path)
Create a service from the service definition. Your workspace needs to be in the East US 2 location.
from azureml.core import Workspace ws = Workspace.from_config() print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n') from azureml.core.model import Model model_name = "resnet-50-rtai" registered_model = Model.register(ws, model_def_path, model_name) from azureml.core.webservice import Webservice from azureml.exceptions import WebserviceException from azureml.contrib.brainwave import BrainwaveWebservice, BrainwaveImage service_name = "imagenet-infer" service = None try: service = Webservice(ws, service_name) except WebserviceException: image_config = BrainwaveImage.image_configuration() deployment_config = BrainwaveWebservice.deploy_configuration() service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config) service.wait_for_deployment(True)
Test the service
To send an image to the API and test the response, add a mapping from the output class ID to the ImageNet class name.
import requests classes_entries = requests.get("https://raw.githubusercontent.com/Lasagne/Recipes/master/examples/resnet50/imagenet_classes.txt").text.splitlines()
Call your service and replace the "your-image.jpg" file name below with an image from your machine.
with open('your-image.jpg') as f: results = service.run(f) # map results [class_id] => [confidence] results = enumerate(results) # sort results by confidence sorted_results = sorted(results, key=lambda x: x, reverse=True) # print top 5 results for top in sorted_results[:5]: print(classes_entries[top], 'confidence:', top)
Clean up service
Delete the service.
Secure FPGA web services
For information on securing FPGA web services, see the Secure web services document.
Learn how to Consume a ML Model deployed as a web service.
We'd love to hear your thoughts. Choose the type you'd like to provide:
Our feedback system is built on GitHub Issues. Read more on our blog.