Analyze data with Azure Machine Learning

This tutorial uses Azure Machine Learning to build a predictive machine learning model based on data stored in Azure SQL Data Warehouse. Specifically, this builds a targeted marketing campaign for Adventure Works, the bike shop, by predicting if a customer is likely to buy a bike or not.

Prerequisites

To step through this tutorial, you need:

1. Get the data

The data is in the dbo.vTargetMail view in the AdventureWorksDW database. To read this data:

  1. Sign into Azure Machine Learning studio and click on my experiments.
  2. Click +NEW and select Blank Experiment.
  3. Enter a name for your experiment: Targeted Marketing.
  4. Drag the Reader module from the modules pane into the canvas.
  5. Specify the details of your SQL Data Warehouse database in the Properties pane.
  6. Specify the database query to read the data of interest.
SELECT [CustomerKey]
  ,[GeographyKey]
  ,[CustomerAlternateKey]
  ,[MaritalStatus]
  ,[Gender]
  ,cast ([YearlyIncome] as int) as SalaryYear
  ,[TotalChildren]
  ,[NumberChildrenAtHome]
  ,[EnglishEducation]
  ,[EnglishOccupation]
  ,[HouseOwnerFlag]
  ,[NumberCarsOwned]
  ,[CommuteDistance]
  ,[Region]
  ,[Age]
  ,[BikeBuyer]
FROM [dbo].[vTargetMail]

Run the experiment by clicking Run under the experiment canvas. Run the experiment

After the experiment finishes running successfully, click the output port at the bottom of the Reader module and select Visualize to see the imported data. View imported data

2. Clean the data

To clean the data, drop some columns that are not relevant for the model. To do this:

  1. Drag the Project Columns module into the canvas.
  2. Click Launch column selector in the Properties pane to specify which columns you wish to drop. Project Columns
  3. Exclude two columns: CustomerAlternateKey and GeographyKey. Remove unnecessary columns

3. Build the model

We will split the data 80-20: 80% to train a machine learning model and 20% to test the model. We will make use of the “Two-Class” algorithms for this binary classification problem.

  1. Drag the Split module into the canvas.
  2. Enter 0.8 for Fraction of rows in the first output dataset in the Properties pane. Split data into training and test set
  3. Drag the Two-Class Boosted Decision Tree module into the canvas.
  4. Drag the Train Model module into the canvas and specify the inputs. Then, click Launch column selector in the Properties pane.
    • First input: ML algorithm.
    • Second input: Data to train the algorithm on. Connect the Train Model module
  5. Select the BikeBuyer column as the column to predict. Select Column to predict

4. Score the model

Now, we will test how the model performs on test data. We will compare the algorithm of our choice with a different algorithm to see which performs better.

  1. Drag Score Model module into the canvas. First input: Trained model Second input: Test data Score the model
  2. Drag the Two-Class Bayes Point Machine into the experiment canvas. We will compare how this algorithm performs in comparison to the Two-Class Boosted Decision Tree.
  3. Copy and Paste the modules Train Model and Score Model in the canvas.
  4. Drag the Evaluate Model module into the canvas to compare the two algorithms.
  5. Run the experiment. Run the experiment
  6. Click the output port at the bottom of the Evaluate Model module and click Visualize. Visualize evaluation results

The metrics provided are the ROC curve, precision-recall diagram and lift curve. Looking at these metrics, we can see that the first model performed better than the second one. To look at the what the first model predicted, click on output port of the Score Model and click Visualize. Visualize score results

You will see two more columns added to your test dataset.

  • Scored Probabilities: the likelihood that a customer is a bike buyer.
  • Scored Labels: the classification done by the model – bike buyer (1) or not (0). This probability threshold for labeling is set to 50% and can be adjusted.

Comparing the column BikeBuyer (actual) with the Scored Labels (prediction), you can see how well the model has performed. As next steps, you can use this model to make predictions for new customers and publish this model as a web service or write results back to SQL Data Warehouse.

Next steps

To learn more about building predictive machine learning models, refer to Introduction to Machine Learning on Azure.