Two-Class Bayes Point Machine
Creates a Bayes point machine binary classification model
Applies to: Machine Learning Studio
This content pertains only to Studio. Similar drag and drop modules have been added to the visual interface in Machine Learning service. Learn more in this article comparing the two versions.
This article describes how to use the Two-Class Bayes Point Machine module in Azure Machine Learning Studio, to create an untrained binary classification model.
The algorithm in this module uses a Bayesian approach to linear classification called the "Bayes Point Machine". This algorithm efficiently approximates the theoretically optimal Bayesian average of linear classifiers (in terms of generalization performance) by choosing one "average" classifier, the Bayes Point. Because the Bayes Point Machine is a Bayesian classification model, it is not prone to overfitting to the training data.
For more information, see Chris Bishop's post on the Microsoft Machine Learning blog: Embracing Uncertainty - Probabilistic Inference.
How to configure Two-Class Bayes Point Machine
In Azure Machine Learning Studio, add the Two-Class Bayes Point Machine module to your experiment. You can find the module under Machine Learning, Initialize Model, Classification.
For Number of training iterations, type a number to specify how often the message-passing algorithm iterates over the training data. Typically, the number of iterations should be set to a value in the range 5 – 100.
The higher the number of training iterations, the more accurate the predictions; however, training will be slower.
For most datasets, the default setting of 30 training iterations is sufficient for the algorithm to make accurate predictions. Sometimes accurate predictions can be made by using fewer iterations. For datasets with highly correlated features, you might benefit from more training iterations.
Select the option, Include bias, if you want a constant feature or bias to be added to each instance in training and prediction.
Including a bias is necessary when the data does not already contain a constant feature.
Select the option, Allow unknown values in categorical features, to create a group for unknown values.
If you deselect this option, the model can accept only the values that are contained in the training data.
If you select this option and allow unknown values, the model might be less precise for known values, but it can provide better predictions for new (unknown) values.
Add an instance of the Train Model module, and your training data.
Connect the training data and the output of the Two-Class Bayes Point Machine module to the Train Model module, and choose the label column.
Run the experiment.
After training is complete, right-click the output of the Train Model module to view the results:
To see a summary of the model's parameters, together with the feature weights learned from training, select Visualize.
To save the model for later use, right-click the output of Train MOdel, and select Save as Trained Model.
To make predictions, use the trained model as an input to the Score Model module.
The untrained model can also be passed to Cross-Validate Model for cross-validation against a labeled data set.
To see how the Two-Class Bayes Point Machine is used in machine learning, see these sample experiments in the Azure AI Gallery:
- Compare Binary Classifiers: This sample demonstrates the use of multiple two-class classifiers.
This section contains implementation details and frequently asked questions about this algorithm.
Details from the original research and underlying theory are available in this paper (PDF): Bayes Point Machines, by Herbert, Graepe, and Campbell
However, this implementation improves on the original algorithm in several ways:
It uses the expectation propagation message-passing algorithm. For more information, see A family of algorithms for approximate Bayesian inference.
A parameter sweep is not required.
This method does not require data to be normalized.
These improvements make the Bayes Point Machine classification model more robust and easier-to-use, and you can bypass the time-consuming step of parameter tuning.
|Number of training iterations||>=1||Integer||30||Specify the number of iterations to use when training|
|Include bias||Any||Boolean||True||Indicate whether a constant feature or bias should be added to each instance|
|Allow unknown values in categorical features||Any||Boolean||True||If True, creates an additional level for each categorical column. Any levels in the test dataset that are not available in the training dataset are mapped to this additional level.|
|Untrained model||ILearner interface||An untrained binary classification model|