Two-Class Support Vector Machine

Creates a binary classification model using the Support Vector Machine algorithm

Category: Machine Learning / Initialize Model / Classification

Module overview

This article describes how to use the Two-Class Support Vector Machine module in Azure Machine Learning Studio, to create a model that is based on the support vector machine algorithm.

Support vector machines (SVMs) are a well-researched class of supervised learning methods. This particular implementation is suited to prediction of two possible outcomes, based on either continuous or categorical variables.

After defining the model parameters, train the model by using one of the training modules, and providing a tagged dataset that includes a label or outcome column.

More about support vector machines

Support vector machines are among the earliest of machine learning algorithms, and SVM models have been used in many applications, from information retrieval to text and image classification. SVMs can be used for both classification and regression tasks.

This SVM model is a supervised learning model that requires labeled data. In the training process, the algorithm analyzes input data and recognizes patterns in a multi-dimensional feature space called the hyperplane. All input examples are represented as points in this space, and are mapped to output categories in such a way that categories are divided by as wide and clear a gap as possible.

For prediction, the SVM algorithm assigns new examples into one category or the other, mapping them into that same space.

How to configure Two-Class Support Vector Machine

For this model type, it is recommended that you normalize the dataset before using it to train the classifier.

  1. Add the Two-Class Support Vector Machine module to your experiment in Studio.

  2. Specify how you want the model to be trained, by setting the Create trainer mode option.

    • Single Parameter: If you know how you want to configure the model, you can provide a specific set of values as arguments.

    • Parameter Range: If you are not sure of the best parameters, you can find the optimal parameters by specifying multiple values and using the Tune Model Hyperparameters module to find the optimal configuration. The trainer iterates over multiple combinations of the settings and determines the combination of values that produces the best model.

  3. For Number of iterations, type a number that denotes the number of iterations used when building the model.

    This parameter can be used to control trade-off between training speed and accuracy.

  4. For Lambda, type a value to use as the weight for L1 regularization.

    This regularization coefficient can be used to tune the model. Larger values penalize more complex models.

  5. Select the option, Normalize features, if you want to normalize features before training.

    If you apply normalization, before training, data points are centered at the mean and scaled to have one unit of standard deviation.

  6. Select the option, Project to the unit sphere, to normalize coefficients.

    Projecting values to unit space means that before training, data points are centered at 0 and scaled to have one unit of standard deviation.

  7. In Random number seed, type an integer value to use as a seed if you want to ensure reproducibility across runs. Otherwise, a system clock value is used as a seed, which can result in slightly different results across runs.

  8. Select the option, Allow unknown category, to create a group for unknown values in the training or validation sets. In this case, the model might be less precise for known values, but it can provide better predictions for new (unknown) values.

    If you deselect it, the model can accept only the values that are contained in the training data.

  9. Connect a labeled dataset, and one of the training modules:

    • If you set Create trainer mode to Single Parameter, use the Train Model module.

    • If you set Create trainer mode to Parameter Range, use the Tune Model Hyperparameters.

      Note

      If you pass a parameter range to Train Model, it will use only the first value in the parameter range list.

      If you pass a single set of parameter values to the Tune Model Hyperparameters module, when it expects a range of settings for each parameter, it ignores the values and using the default values for the learner.

      If you select the Parameter Range option and enter a single value for any parameter, that single value you specified will be used throughout the sweep, even if other parameters change across a range of values.

  10. Run the experiment.

Results

After training is complete:

  • To see a summary of the model's parameters, together with the feature weights learned from training, , right-click the output of Train Model or Tune Model Hyperparameters, and select Visualize.

  • To use the trained models to make predictions, connect the trained model to the Score Model module.

  • To perform cross-validation against a labeled data set, connect the untrained model and the dataset to Cross-Validate Model.

Examples

For examples of how this learning algorithm is used, see the Azure AI Gallery:

Technical notes

This section contains implementation details, tips, and answers to frequently asked questions.

Usage tips

For this model type, it is recommended that you normalize the dataset before using it to train the classifier.

Although recent research has developed algorithms that have higher accuracy, this algorithm can work well on simple data sets when your goal is speed over accuracy. If you do not get the desired results by using Two-Class Support Vector Model, try one of these classification methods:

Module parameters

Name Range Type Default Description
Number of iterations >=1 Integer 1 The number of iterations
Lambda >=double.Epsilon Float 0.001 Weight for L1 regularization. Using a non-zero value avoids overfitting the model to the training dataset.
Normalize features Any Boolean True If True, normalize the features.
Project to the unit-sphere Any Boolean False If True, project the features to a unit circle.
Random number seed Any Integer The seed for the random number generator used by the model. Leave it blank for the default.
Allow unknown categorical levels Any Boolean True If True, creates an additional level for each categorical column. Any levels in the test dataset that are not available in the training dataset are mapped to this additional level.

Output

Name Type Description
Untrained model Data Table An untrained binary classification model.

See also

Classification
A-Z Module List