MicrosoftML (R library in SQL Server)
MicrosoftML is an R function library from Microsoft providing high-performance machine learning algorithms. It includes functions for training and transformations, scoring, text and image analysis, and feature extraction for deriving values from existing data.
The machine learning APIs were developed by Microsoft for internal machine learning applications, and have been refined over the years to support high performance on big data, using multicore processing and fast data streaming. MicrosoftML also includes numerous transformations for text and image processing.
Full reference documentation
The MicrosoftML library is distributed in multiple Microsoft products, but usage is the same whether you get the library in SQL Server or another product. Because the functions are the same, documentation for individual RevoScaleR functions is published to just one location under the R reference for Microsoft Machine Learning Server. Should any product-specific behaviors exist, discrepancies will be noted in the function help page.
Versions and platforms
The MicrosoftML library is based on R 3.4.3 and available only when you install one of the following Microsoft products or downloads:
- SQL Server 2016 R Services
- SQL Server 2017 Machine Learning Services
- Microsoft Machine Learning Server 9.2.0 or later
- Microsoft R client
Full product release versions are Windows-only, starting with SQL Server 2017. Linux support for MicrosoftML is new in SQL Server 2019 Preview.
Algorithms in MicrosoftML depend on RevoScaleR for:
- Data source objects. Data consumed by MicrosoftML functions are created using RevoScaleR functions.
- Remote computing (shifting function execution to a remote SQL Server instance). The RevoScaleR library provides functions for creating and activating a remote compute context for SQL server.
In most cases, you will load the packages together whenever you are using MicrosoftML.
Functions by category
This section lists the functions by category to give you an idea of how each one is used. You can also use the table of contents to find functions in alphabetical order.
1-Machine learning algorithms
|rxFastTrees||An implementation of FastRank, an efficient implementation of the MART gradient boosting algorithm.|
|rxFastForest||A random forest and Quantile regression forest implementation using rxFastTrees.|
|rxLogisticRegression||Logistic regression using L-BFGS.|
|rxOneClassSvm||One class support vector machines.|
|rxNeuralNet||Binary, multi-class, and regression neural net.|
|rxFastLinear||Stochastic dual coordinate ascent optimization for linear binary classification and regression.|
|rxEnsemble||Trains a number of models of various kinds to obtain better predictive performance than could be obtained from a single model.|
|concat||Transformation to create a single vector-valued column from multiple columns.|
|categorical||Create indicator vector using categorical transform with dictionary.|
|categoricalHash||Converts the categorical value into an indicator array by hashing.|
|featurizeText||Produces a bag of counts of sequences of consecutive words, called n-grams, from a given corpus of text. It offers language detection, tokenization, stopwords removing, text normalization and feature generation.|
|getSentiment||Scores natural language text and creates a column that contains probabilities that the sentiments in the text are positive.|
|ngram||allows defining arguments for count-based and hash-based feature extraction.|
|selectColumns||Selects a set of columns to retrain, dropping all others.|
|selectFeatures||Selects features from the specified variables using a specified mode.|
|loadImage||Loads image data.|
|resizeImage||Resizes an image to a specified dimension using a specified resizing method.|
|extractPixels||Extracts the pixel values from an image.|
|featurizeImage||Featurizes an image using a pre-trained deep neural network model.|
3-Scoring and training functions
|rxPredict.mlModel||Runs the scoring library either from SQL Server, using the stored procedure, or from R code enabling real-time scoring to provide much faster prediction performance.|
|rxFeaturize||Transforms data from an input data set to an output data set.|
|mlModel||Provides a summary of a Microsoft R Machine Learning model.|
4-Loss functions for classification and regression
|expLoss||Specifications for exponential classification loss function.|
|logLoss||Specifications for log classification loss function.|
|hingeLoss||Specifications for hinge classification loss function.|
|smoothHingeLoss||Specifications for smooth hinge classification loss function.|
|poissonLoss||Specifications for poisson regression loss function.|
|squaredLoss||Specifications for squared regression loss function.|
5-Feature selection functions
|minCount||Specification for feature selection in count mode.|
|mutualInformation||Specification for feature selection in mutual information mode.|
6-Ensemble modeling functions
|fastTrees||Creates a list containing the function name and arguments to train a Fast Tree model with rxEnsemble.|
|fastForest||Creates a list containing the function name and arguments to train a Fast Forest model with rxEnsemble.|
|fastLinear||Creates a list containing the function name and arguments to train a Fast Linear model with rxEnsemble.|
|logisticRegression||Creates a list containing the function name and arguments to train a Logistic Regression model with rxEnsemble.|
|oneClassSvm||Creates a list containing the function name and arguments to train a OneClassSvm model with rxEnsemble.|
7-Neural networking functions
|optimizer||Specifies optimization algorithms for the rxNeuralNet machine learning algorithm.|
8-Package state functions
|rxHashEnv||An environment object used to store package-wide state.|
How to use MicrosoftML
Functions in MicrosoftML are callable in R code encapsulated in stored procedures. Most developers build MicrosoftML solutions locally, and then migrate finished R code to stored procedures as a deployment exercise.
The MicrosoftML package for R is installed "out-of-the-box" in SQL Server 2017. It is also available for use with SQL Server 2016 if you upgrade the R components for the instance: Upgrade an instance of SQL Server using binding
The package is not loaded by default. As a first step, load the MicrosoftML package, and then load RevoScaleR if you need to use remote compute contexts or related connectivity or data source objects. Then, reference the individual functions you need.
library(microsoftml); library(RevoScaleR); logisticRegression(args);