What is MicrosoftML?

MicrosoftML adds state-of-the-art data transforms, machine learning algorithms, and pre-trained models to R and Python functionality. The data transforms provided by MicrosoftML allow you to compose a custom set of transforms in a pipeline that are applied to your data before training or testing. The primary purpose of these transforms is to allow you to format your data.

The MicrosoftML functions are provided through the MicrosoftML package installed with Machine Learning Server, Microsoft R Client, and SQL Server Machine Learning Services.

Functions provide fast and scalable machine learning algorithms that enable you to tackle common machine learning tasks such as classification, regression, and anomaly detection. These high-performance algorithms are multi-threaded, some of which execute off disk, so that they can scale up to 100s of GBs on a single-node. They are especially suitable for handling a large corpus of text data or high-dimensional categorical data. It enables you to run these functions locally on Windows or Linux machines or on Azure HDInsight (Hadoop/Spark) clusters.

Pre-trained models for sentiment analysis and image featurization can also be installed and deployed with MicrosoftML. For more information on the pre-trained models and samples, see R samples for MicrosoftML and Python samples for MicrosoftML.

Match algorithms to machine learning tasks

Matching data transforms and machine learning algorithms to appropriate data science tasks is key to designing successful intelligent applications.

Machine learning tasks

The MicrosoftML package implements algorithms that can perform a variety of machine learning tasks:

  • binary classification: algorithms that learn to predict which of two classes an instance of data belongs to. These provide supervised learning in which the input of a classification algorithm is a set of labeled examples. Each example is represented as a feature vector, and each label is an integer of value of 0 or 1. The output of a binary classification algorithm is a classifier, which can be used to predict the label of new unlabeled instances.
  • multi-class classification: algorithms that learn to predict the category of an instance of data. These provide supervised learning in which the input of a classification algorithm is a set of labeled examples. Each example is represented as a feature vector, and each label is an integer between 0 and k-1, where k is the number of classes. The output of a classification algorithm is a classifier, which can be used to predict the label of a new unlabeled instance.
  • regression: algorithms that learn to predict the value of a dependent variable from a set of related independent variables. Regression algorithms model this relationship to determine how the typical values of dependent variables change as the values of the independent variables are varied. These provide supervised learning in which the input of a regression algorithm is a set of examples with dependent variables of known values. The output of a regression algorithm is a function, which can be used to predict the value of a new data instance whose dependent variables are not known.
  • anomaly detection: algorithms that identify outliers that do not belong to some target class or conform to an expected pattern. One-class anomaly detection is a type of unsupervised learning as the input data only contains data that is from the target class and does not contain instances of anomalies to learn from.

Machine learning algorithms

The following table summarizes the MicrosoftML algorithms, the tasks they support, their scalability, and lists some example applications.

Algorithm (R/Python) ML task supported Scalability Application Examples
rxFastLiner()/
rx_fast-linear()

Fast Linear model
(SDCA)
binary classification, linear regression #cols: ~1B;
#rows: ~1B;
CPU: multi-proc
Mortgage default prediction, Email spam filtering
rxOneClassSvm()/
rx_oneclass-svm()

OneClass SVM
anomaly detection cols: ~1K;
#rows: RAM-bound;
CPU: single-proc
Credit card fraud detection
rxFastTrees()/
rx_fast-trees()

Fast Tree
binary classification, regression #cols: ~50K;
#rows: RAM-bound;
CPU: multi-proc
Bankruptcy prediction
rxFastForest()/
rx_fast-forest()

Fast Forest
binary classification, regression #cols: ~50K;
#rows: RAM-bound;
CPU: multi-proc
Churn Prediction
rxNeuralNet()/
rx_neural_network()

Neural Network
binary and multiclass classification, regression #cols: ~10M;
#rows: Inf;
CPU: multi-proc CUDA GPU
Check signature recognition, OCR, Click Prediction
rxLogisticRegression()/
rx_logistic-regression()

Logistic regression
binary and multiclass classification #cols: ~100M;
#rows: Inf for single-proc CPU
#rows: RAM-bound for multi-proc CPU
Classifying sentiments from feedback

Data transforms

MicrosoftML also provides transforms to help tailor your data for machine learning. They are used to clean, wrangle, train, and score your data. For a description of the transforms, see Machine learning R transforms and Machine learning Python transforms reference documentation.

Next steps

For reference documentation on the R individual transforms and functions in the product help, see MicrosoftML: machine learning algorithms.

For reference documentation on the Python individual transforms and functions in the product help, see MicrosoftML: machine learning algorithms.

For guidance when choosing the appropriate machine learning algorithm from the MicrosoftML package, see the Cheat Sheet: How to choose a MicrosoftML algorithm.

See also

Machine Learning Server

R samples for MicrosoftML

Python samples for MicrosoftML