Tutorial: Deploy a machine learning model with the designer
You can deploy the predictive model developed in part one of the tutorial to give others a chance to use it. In part one, you trained your model. Now, it's time to generate predictions based on user input. In this part of the tutorial, you will:
- Create a real-time inference pipeline.
- Create an inferencing cluster.
- Deploy the real-time endpoint.
- Test the real-time endpoint.
Complete part one of the tutorial to learn how to train and score a machine learning model in the designer.
If you do not see graphical elements mentioned in this document, such as buttons in studio or designer, you may not have the right level of permissions to the workspace. Please contact your Azure subscription administrator to verify that you have been granted the correct level of access. For more information, see Manage users and roles.
Create a real-time inference pipeline
To deploy your pipeline, you must first convert the training pipeline into a real-time inference pipeline. This process removes training modules and adds web service inputs and outputs to handle requests.
Create a real-time inference pipeline
Above the pipeline canvas, select Create inference pipeline > Real-time inference pipeline.
Your pipeline should now look like this:
When you select Create inference pipeline, several things happen:
- The trained model is stored as a Dataset module in the module palette. You can find it under My Datasets.
- Training modules like Train Model and Split Data are removed.
- The saved trained model is added back into the pipeline.
- Web Service Input and Web Service Output modules are added. These modules show where user data enters the pipeline and where data is returned.
By default, the Web Service Input will expect the same data schema as the training data used to create the predictive pipeline. In this scenario, price is included in the schema. However, price isn't used as a factor during prediction.
Select Submit, and use the same compute target and experiment that you used in part one.
If this is the first run, it may take up to 20 minutes for your pipeline to finish running. The default compute settings have a minimum node size of 0, which means that the designer must allocate resources after being idle. Repeated pipeline runs will take less time since the compute resources are already allocated. Additionally, the designer uses cached results for each module to further improve efficiency.
Create an inferencing cluster
In the dialog box that appears, you can select from any existing Azure Kubernetes Service (AKS) clusters to deploy your model to. If you don't have an AKS cluster, use the following steps to create one.
Select Compute in the dialog box that appears to go to the Compute page.
On the navigation ribbon, select Inference Clusters > + New.
In the inference cluster pane, configure a new Kubernetes Service.
Enter aks-compute for the Compute name.
Select a nearby region that's available for the Region.
It takes approximately 15 minutes to create a new AKS service. You can check the provisioning state on the Inference Clusters page.
Deploy the real-time endpoint
After your AKS service has finished provisioning, return to the real-time inferencing pipeline to complete deployment.
Select Deploy above the canvas.
Select Deploy new real-time endpoint.
Select the AKS cluster you created.
You can also change Advanced setting for your real-time endpoint.
Advanced setting Description Enable Application Insights diagnostics and data collection Whether to enable Azure Application Insights to collect data from the deployed endpoints. By default: false Scoring timeout A timeout in milliseconds to enforce for scoring calls to the web service.By default: 60000 Auto scale enabled Whether to enable autoscaling for the web service.By default: true Min replicas The minimum number of containers to use when autoscaling this web service.By default: 1 Max replicas The maximum number of containers to use when autoscaling this web service. By default: 10 Target utilization The target utilization (in percent out of 100) that the autoscaler should attempt to maintain for this web service. By default: 70 Refresh period How often (in seconds) the autoscaler attempts to scale this web service. By default: 1 CPU reserve capacity The number of CPU cores to allocate for this web service. By default: 0.1 Memory reserve capacity The amount of memory (in GB) to allocate for this web service. By default: 0.5
A success notification above the canvas appears after deployment finishes. It might take a few minutes.
You can also deploy to Azure Container Instance (ACI) if you select Azure Container Instance for Compute type in the real-time endpoint setting box. Azure Container Instance is used for testing or development. Use ACI for low-scale CPU-based workloads that require less than 48 GB of RAM.
Test the real-time endpoint
After deployment finishes, you can view your real-time endpoint by going to the Endpoints page.
On the Endpoints page, select the endpoint you deployed.
In the Details tab, you can see more information such as the REST URI, Swagger definition, status, and tags.
In the Consume tab, you can find sample consumption code, security keys, and set authentication methods.
In the Deployment logs tab, you can find the detailed deployment logs of your real-time endpoint.
To test your endpoint, go to the Test tab. From here, you can enter test data and select Test verify the output of your endpoint.
For more information on consuming your web service, see Consume a model deployed as a webservice
If you make some modifications in your training pipeline, you should resubmit the training pipeline, Update the inference pipeline and run the inference pipeline again.
Note that only trained models will be updated in the inference pipeline, while data transformation will not be updated.
To use the updated transformation in inference pipeline, you need to register the transformation output of the transformation module as dataset.
Then manually replace the TD- module in inference pipeline with the registered dataset.
Then you can submit the inference pipeline with the updated model and transformation, and deploy.
Clean up resources
You can use the resources that you created as prerequisites for other Azure Machine Learning tutorials and how-to articles.
If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges.
In the Azure portal, select Resource groups on the left side of the window.
In the list, select the resource group that you created.
Select Delete resource group.
Deleting the resource group also deletes all resources that you created in the designer.
Delete individual assets
In the designer where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.
The compute target that you created here automatically autoscales to zero nodes when it's not being used. This action is taken to minimize charges. If you want to delete the compute target, take these steps:
You can unregister datasets from your workspace by selecting each dataset and selecting Unregister.
To delete a dataset, go to the storage account by using the Azure portal or Azure Storage Explorer and manually delete those assets.
In this tutorial, you learned the key steps in how to create, deploy, and consume a machine learning model in the designer. To learn more about how you can use the designer see the following links:
- Designer samples: Learn how to use the designer to solve other types of problems.
- Use Azure Machine Learning studio in an Azure virtual network.