Tutorial: Forecast demand with automated machine learning
APPLIES TO: Basic edition Enterprise edition (Upgrade to Enterprise)
In this tutorial, you use automated machine learning, or automated ML, in the Azure Machine Learning studio to create a time-series forecasting model to predict rental demand for a bike sharing service.
For a classification model example, see Tutorial: Create a classification model with automated ML in Azure Machine Learning.
In this tutorial, you learn how to do the following tasks:
- Create and load a dataset.
- Configure and run an automated ML experiment.
- Specify forecasting settings.
- Explore the experiment results.
- Deploy the best model.
- An Enterprise edition Azure Machine Learning workspace. If you don't have a workspace, create an Enterprise edition workspace.
- Automated machine learning in the Azure Machine Learning studio is only available for Enterprise edition workspaces.
- Download the bike-no.csv data file
Get started in Azure Machine Learning studio
For this tutorial, you create your automated ML experiment run in Azure Machine Learning studio, a consolidated interface that includes machine learning tools to perform data science scenarios for data science practitioners of all skill levels. The studio is not supported on Internet Explorer browsers.
Sign in to Azure Machine Learning studio.
Select your subscription and the workspace you created.
Select Get started.
In the left pane, select Automated ML under the Author section.
Select +New automated ML run.
Create and load dataset
Before you configure your experiment, upload your data file to your workspace in the form of an Azure Machine Learning dataset. Doing so, allows you to ensure that your data is formatted appropriately for your experiment.
On the Select dataset form, select From local files from the +Create dataset drop-down.
On the Basic info form, give your dataset a name and provide an optional description. The dataset type should default to Tabular, since automated ML in Azure Machine Learning studio currently only supports tabular datasets.
Select Next on the bottom left
On the Datastore and file selection form, select the default datastore that was automatically set up during your workspace creation, workspaceblobstore (Azure Blob Storage). This is the storage location where you'll upload your data file.
Choose the bike-no.csv file on your local computer. This is the file you downloaded as a prerequisite.
When the upload is complete, the Settings and preview form is pre-populated based on the file type.
Verify that the Settings and preview form is populated as follows and select Next.
Field Description Value for tutorial File format Defines the layout and type of data stored in a file. Delimited Delimiter One or more characters for specifying the boundary between separate, independent regions in plain text or other data streams. Comma Encoding Identifies what bit to character schema table to use to read your dataset. UTF-8 Column headers Indicates how the headers of the dataset, if any, will be treated. Use headers from the first file Skip rows Indicates how many, if any, rows are skipped in the dataset. None
The Schema form allows for further configuration of your data for this experiment.
For this example, choose to ignore the casual and registered columns. These columns are a breakdown of the cnt column so, therefore we don't include them.
Also for this example, leave the defaults for the Properties and Type.
On the Confirm details form, verify the information matches what was previously populated on the Basic info and Settings and preview forms.
Select Create to complete the creation of your dataset.
Select your dataset once it appears in the list.
Configure experiment run
After you load and configure your data, set up your remote compute target and select which column in your data you want to predict.
- Populate the Configure run form as follows:
Enter an experiment name:
Select cnt as the target column, what you want to predict. This column indicates the number of total bike share rentals.
Select Create a new compute and configure your compute target. Automated ML only supports Azure Machine Learning compute.
Field Description Value for tutorial Compute name A unique name that identifies your compute context. bike-compute Virtual machine size Select the virtual machine size for your compute. Standard_DS12_V2 Min / Max nodes (in Advanced Settings) To profile data, you must specify 1 or more nodes. Min nodes: 1
Max nodes: 6
Select Create to get the compute target.
This takes a couple minutes to complete.
After creation, select your new compute target from the drop-down list.
Select forecast settings
Complete the setup for your automated ML experiment by specifying the machine learning task type and configuration settings.
On the Task type and settings form, select Time series forecasting as the machine learning task type.
Select date as your Time column and leave Group by column(s) blank.
Select View additional configuration settings and populate the fields as follows. These settings are to better control the training job and specify settings for your forecast. Otherwise, defaults are applied based on experiment selection and data.
Additional configurations Description Value for tutorial Primary metric Evaluation metric that the machine learning algorithm will be measured by. Normalized root mean squared error Automatic featurization Enables preprocessing. This includes automatic data cleansing, preparing, and transformation to generate synthetic features. Enable Explain best model (preview) Automatically shows explainability on the best model created by automated ML. Enable Blocked algorithms Algorithms you want to exclude from the training job Extreme Random Trees Additional forecasting settings These settings help improve the accuracy of your model
Forecast horizon: length of time into the future you want to predict
Forecast target lags: how far back you want to construct the lags of a the target variable
Target rolling window: specifies the size of the rolling window over which features, such as the max, min and sum, will be generated.
Forecast horizon: 14
Forecast target lags: None
Target rolling window size: None
Exit criterion If a criteria is met, the training job is stopped. Training job time (hours): 3
Metric score threshold: None
Validation Choose a cross-validation type and number of tests. Validation type:
Number of validations: 5
Concurrency The maximum number of parallel iterations executed per iteration Max concurrent iterations: 6
To run your experiment, select Finish. The Run details screen opens with the Run status at the top next to the run number. This status updates as the experiment progresses.
Preparation takes 10-15 minutes to prepare the experiment run.
Once running, it takes 2-3 minutes more for each iteration.
In production, you'd likely walk away for a bit as this process takes time. While you wait, we suggest you start exploring the tested algorithms on the Models tab as they complete.
Navigate to the Models tab to see the algorithms (models) tested. By default, the models are ordered by metric score as they complete. For this tutorial, the model that scores the highest based on the chosen Normalized root mean squared error metric is at the top of the list.
While you wait for all of the experiment models to finish, select the Algorithm name of a completed model to explore its performance details.
The following example navigates through the Model details and the Visualizations tabs to view the selected model's properties, metrics and performance charts.
Deploy the model
Automated machine learning in Azure Machine Learning studio allows you to deploy the best model as a web service in a few steps. Deployment is the integration of the model so it can predict on new data and identify potential areas of opportunity.
For this experiment, deployment to a web service means that the bike share company now has an iterative and scalable web solution for forecasting bike share rental demand.
Once the run is complete, navigate back to the Run detail page and select the Models tab.
In this experiment context, StackEnsemble is considered the best model, based on the Normalized root mean squared error metric. We deploy this model, but be advised, deployment takes about 20 minutes to complete. The deployment process entails several steps including registering the model, generating resources, and configuring them for the web service.
Select the Deploy best model button in the bottom-left corner.
Populate the Deploy a model pane as follows:
Field Value Deployment name bikeshare-deploy Deployment description bike share demand deployment Compute type Select Azure Compute Instance (ACI) Enable authentication Disable. Use custom deployment assets Disable. Disabling allows for the default driver file (scoring script) and environment file to be autogenerated.
For this example, we use the defaults provided in the Advanced menu.
A green success message appears at the top of the Run screen stated that the deployment was started successfully. The progress of the deployment can be found
in the Recommended model pane under Deploy status.
Once deployment succeeds, you have an operational web service to generate predictions.
Proceed to the Next steps to learn more about how to consume your new web service, and test your predictions using Power BI's built in Azure Machine Learning support.
Clean up resources
Deployment files are larger than data and experiment files, so they cost more to store. Delete only the deployment files to minimize costs to your account, or if you want to keep your workspace and experiment files. Otherwise, delete the entire resource group, if you don't plan to use any of the files.
Delete the deployment instance
Delete just the deployment instance from the Azure Machine Learning studio, if you want to keep the resource group and workspace for other tutorials and exploration.
Go to the Azure Machine Learning studio. Navigate to your workspace and on the left under the Assets pane, select Endpoints.
Select the deployment you want to delete and select Delete.
Delete the resource group
The resources you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to articles.
If you don't plan to use the resources you created, delete them, so you don't incur any charges:
In the Azure portal, select Resource groups on the far left.
From the list, select the resource group you created.
Select Delete resource group.
Enter the resource group name. Then select Delete.
In this tutorial, you used automated ML in the Azure Machine Learning studio to create and deploy a time series forecasting model that predicts bike share rental demand.
See this article for steps on how to create a Power BI supported schema to facilitate consumption of your newly deployed web service:
- Learn more about automated machine learning.
- For more information on classification metrics and charts, see the Understand automated machine learning results article.
- Learn more about featurization.
- Learn more about data profiling.
This bike share dataset has been modified for this tutorial. This dataset was made available as part of a Kaggle competition and was originally available via Capital Bikeshare. It can also be found within the UCI Machine Learning Database.
Source: Fanaee-T, Hadi, and Gama, Joao, Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg.