Understanding AzureML – Part 2: Binary Classification

This is a pull from my blog website:  www.IndieDevSpot.com Feel free to visit that site for more blog articles/posts or join the monthly newsletter.

To see the article on that website, please visit: http://indiedevspot.azurewebsites.net/2014/10/29/understanding-azureml-part-2-binary-classification/


Hello World!

Welcome to Part 2!  We will be discussing Binary Classification.  So I hope many of you have started using AzureML.  If not, you should definitely check it out.  Here is the link to the dev center for it.  This article will focus on a few key points.

  • Understanding the Evaluation of each Model Type.
  • Understanding the published Web Service of each Model

If you are looking for how to build a simple how to get started, check out this article.

The series will be broken down into a three parts.

Part 1: Regression

Part 2: Binary Classification

Part 3: Multi-Class Classification

So lets get started!



About the Data Set Used

The data set used is new car data from the sample Azure ML data sets.  We are predicting car’s Engine Location!  I let the Feature Selection Module pick the best 5 features to use, which it determined are: make, number of cylinders, body-style, engine-type, and peak-rpm.  I chose engine location for Binary Classification, because it only comes in two flavors.  In the front or in the back!

Important Values

Note, when doing Binary Classification, you utilize the classification modules preceded with “Two Class”.  There are a few important values to get to understand.


  • True Positive:   The number of true positive predictions.  Predicted true and is in fact true.
  • False Positive:   The number of false positive predictions.  Predicted true and is in fact false.
  • True Negative:   The number of true negative predictions.  Predicted false and is in fact false.
  • False Negative:   The number of false negative predictions.  Predicted false and is in fact false.
  • Accuracy:   Total correct predictions divided by Total predictions.  % accuracy. value ranges between 0 and 1.
  • Precision: Percent of true positive predictions by total guess of a true.  Correctly Predicted Positives / Total Predicted Positives.
  • Recall: Percent of correctly identified positives.  Correctly Predicted Positives / Actual Positives.
  • F1 Score: Accuracy in a way.  Weighted average of precision and recall.  Ranges from 0 to 1, but best at 1 and worst at 0.
  • Threshold: The value above which a prediction belongs to the first class, otherwise the second class.

Understanding the Values

In my opinion, for binary classification each and every one of these values is VERY important.  I specifically chose engine location to demonstrate this purpose.  At first glance, this looks like the best model since sliced bread.  HOLY CRAP, we have 98% accuracy!  We only have a single false anything!  So why in the world is our Precision and F1 score so low?  Well if you look at it a little bit closer, you will notice that there is only a single true positive and in fact a single false positive.  We have ZERO false negatives.  This means that there is only a single vehicle which actually has a rear engine in the testing set.

So in situations in which you are attempting to correctly identify something rare, the Precision, Recall and F1 Scores become your best friend.  In most other cases the Accuracy and others become important.

Lets assume that you are attempting to identify axe murderers.  False negatives could literally be fatal, but perhaps this is for a credit rating, the government can put you in jail if you give somebody a bad reputation that doesn’t have one, even though they are likely an axe murderer.  In that case, False positives could be detrimental.  So I digress on that as this is not about ethics.  In any case, each of these properties is VERY important for tuning your algorithm.

Understanding Published Web Services

Alright, I’ve seen a ton of articles about this that are just completely wrong.  So lets set the record straight on how this actually works.  This is broken down into a few parts.

  1. Save your trained Model
  2. Create your inputs/outputs
  3. Publish the web service
  4. Understanding your request/response.

Save your trained model

In this instance, both models are roughly similar, as both are not particularly good (there is only 1 rear engine car in the entire data set).  So lets just pick a model to save.  To save the trained model, you simply click the output node of the train model module and select “Save Trained Model”.



Building the Production System

Take note of your current inputs to the trained model. Write them down on a piece of paper. Then create a new experiment.  Add your initial data set again, along with a project columns, score model.  Instead of a new classification and train model module, add your trained model and pipe that into your Score Model.  Your experiment should look similar to below.


For project columns, start with no columns and include everything you used to train your model EXCEPT the values you are attempting to predict.  Note that you should have included your prediction value when training your model, but definitely not for your production system.  If somebody provides the engine-location, what is the point in predicting it?

Create your inputs/outputs

Right click the right node of the Score Model Module and select “Set as publish Input”.  Right click the output node of the Score Model Module and select “Set as publish Output”.  Run your experiment.  It should look similar to below.



Creating and Understanding the Published Service

The publish web service button should now be available.  Click it and name your service.  If it is not available, you may have had an error in your run.  Fix the error and run again.

The Request

This should be simple enough to understand if you get JSON (I hope you do if you are working with web requests).

1 2 3 4 5 6 7 8 9 10 11 12 13 {   "Id": "score00001",   "Instance": {     "FeatureVector": {       "num-of-cylinders": "0",       "make": "0",       "body-style": "0",       "engine-type": "0",       "peak-rpm": "0"     },     "GlobalParameters": {}   } }

The Response

For Binary Classification, the response is fairly straightforward to understand as well.  It comes back as a string array in the order num-of-cylinders, make, body-style, engine-type, peak-rpm, Scored Labels, Scored Probabilities.  Where scored labels in this case is whether the car is front engine or rear engine.  Scored Probabilities is the % confidence in the prediction.

1 ["0","0","0","0","0","0","0"]

Note that if you don’t want num-of-cylinders, make, body-style, engine-type, peak-rpm returned, after your Score Model module, you can project columns and exclude all except Scored Labels and Scored Probabilities  and the only return value will be the predicted value.


I hope you all enjoyed this article and found it helpful.  Azure Machine Learning certainly reduced the bar to Machine Learning significantly, and I am extremely excited I only need to understand the gist of these metrics to produce powerful tools that can predict whatever I want.  Keep in tune for part 3.  If you want to know more about regression, go here.