MicrosoftML(SQL Server 机器学习服务中的 R 包)MicrosoftML (R package in SQL Server Machine Learning Services)

适用于:Applies to: 是SQL Server 2016 (13.x)SQL Server 2016 (13.x)yesSQL Server 2016 (13.x)SQL Server 2016 (13.x) 及更高版本适用于:Applies to: 是SQL Server 2016 (13.x)SQL Server 2016 (13.x)yesSQL Server 2016 (13.x)SQL Server 2016 (13.x) and later

“MicrosoftML”是 Microsoft 的一个 R 包,可提供高性能的机器学习算法。MicrosoftML is an R package from Microsoft that provides high-performance machine learning algorithms. 它包括用于定型和转换、评分、文本和图像分析的功能,以及用于从现有数据中派生值的特征提取功能。It includes functions for training and transformations, scoring, text and image analysis, and feature extraction for deriving values from existing data. 该包包含在 SQL Server 机器学习服务SQL Server 2016 R Services 中,为大数据提供高性能支持,使用多核处理,支持快速数据流式处理。The package is included in SQL Server Machine Learning Services and SQL Server 2016 R Services and supports high performance on big data, using multicore processing, and fast data streaming. 此外,MicrosoftML 还包括大量的文本和图像处理转换。MicrosoftML also includes numerous transformations for text and image processing.

完整参考文档Full reference documentation

“MicrosoftML”包分布于多种 Microsoft 产品中,但不管是在 SQL Server 还是在其他产品中获取该包,用法都是一样的。The MicrosoftML package is distributed in multiple Microsoft products, but usage is the same whether you get the package in SQL Server or another product. 由于函数相同,因此单个 RevoScaleR 函数的文档仅发布到 Microsoft Machine Learning Server 的 R 引用下的一个位置。Because the functions are the same, documentation for individual RevoScaleR functions is published to just one location under the R reference for Microsoft Machine Learning Server. 如果存在任何特定于产品的行为,这些差异将在函数帮助页中注明。Should any product-specific behaviors exist, discrepancies will be noted in the function help page.

版本和平台Versions and platforms

“MicrosoftML”包基于 R 3.4.3,且仅在安装以下 Microsoft 产品之一或下载时可用:The MicrosoftML package is based on R 3.4.3 and available only when you install one of the following Microsoft products or downloads:

备注

完整产品发布版本为 SQL Server 2017(仅限 Windows)。Full product release versions are Windows-only in SQL Server 2017. SQL Server 2019 中,MicrosoftML 同时支持 Windows 和 Linux。Both Windows and Linux are supported for MicrosoftML in SQL Server 2019.

包依赖项Package dependencies

对于以下各项,MicrosoftML 中的算法依赖于 RevoScaleRAlgorithms in MicrosoftML depend on RevoScaleR for:

  • 数据源对象。Data source objects. MicrosoftML 函数使用的数据是使用 RevoScaleR 函数创建的 。Data consumed by MicrosoftML functions are created using RevoScaleR functions.
  • 远程计算(将函数执行转移到远程 SQL Server 实例)。Remote computing (shifting function execution to a remote SQL Server instance). “RevoScaleR”包提供用于创建和激活 SQL Server 远程计算上下文的函数。The RevoScaleR package provides functions for creating and activating a remote compute context for SQL server.

在大多数情况下,只要使用 MicrosoftML,就需同时加载这些包。In most cases, you will load the packages together whenever you are using MicrosoftML.

按类别列出函数Functions by category

本部分按类别列出函数,以帮助了解每个函数的使用方式。This section lists the functions by category to give you an idea of how each one is used. 此外,还可以使用目录按字母顺序查找函数。You can also use the table of contents to find functions in alphabetical order.

1 - 机器学习算法1-Machine learning algorithms

函数名称Function name 说明Description
rxFastTreesrxFastTrees FastRank(MART 梯度提升算法的一种有效实现)的一种实现。An implementation of FastRank, an efficient implementation of the MART gradient boosting algorithm.
rxFastForestrxFastForest 一种使用 rxFastTrees 的随机林和分位数回归林实现。A random forest and Quantile regression forest implementation using rxFastTrees.
rxLogisticRegressionrxLogisticRegression 使用 L-BFGS 的逻辑回归。Logistic regression using L-BFGS.
rxOneClassSvmrxOneClassSvm 单类支持向量机。One class support vector machines.
rxNeuralNetrxNeuralNet 二进制、多类和回归神经网络。Binary, multi-class, and regression neural net.
rxFastLinearrxFastLinear 用于线性二元分类和回归的随机双坐标上升优化。Stochastic dual coordinate ascent optimization for linear binary classification and regression.
rxEnsemblerxEnsemble 定型多种不同类型的模型,以获得比单个模型更好的预测性能。Trains a number of models of various kinds to obtain better predictive performance than could be obtained from a single model.

2 - 转换函数2-Transformation functions

函数名称Function name 说明Description
concatconcat 用于从多个列创建单个向量值列的转换。Transformation to create a single vector-valued column from multiple columns.
categoricalcategorical 使用带字典的分类转换创建指示器向量。Create indicator vector using categorical transform with dictionary.
categoricalHashcategoricalHash 通过哈希将分类值转换为指示器数组。Converts the categorical value into an indicator array by hashing.
featurizeTextfeaturizeText 从给定的文本语料库中生成大量的连续单词序列(称为 n-grams)。Produces a bag of counts of sequences of consecutive words, called n-grams, from a given corpus of text. 它提供了语言检测、词汇切分、非索引字删除、文本规范化和功能生成功能。It offers language detection, tokenization, stopwords removing, text normalization, and feature generation.
getSentimentgetSentiment 为自然语言文本评分,并创建一个列,该列显示文本中的情绪为积极情绪的可能性。Scores natural language text and creates a column that contains probabilities that the sentiments in the text are positive.
ngramngram 允许为基于计数和基于哈希的功能提取定义参数。allows defining arguments for count-based and hash-based feature extraction.
selectColumnsselectColumns 选择一组要重新定型的列,删除所有其他列。Selects a set of columns to retrain, dropping all others.
selectFeaturesselectFeatures 使用指定模式从指定变量中选择特性。Selects features from the specified variables using a specified mode.
loadImageloadImage 加载图像数据。Loads image data.
resizeImageresizeImage 使用指定的大小调整方法,将图像的大小调整为指定的维度。Resizes an image to a specified dimension using a specified resizing method.
extractPixelsextractPixels 从图像中提取像素值。Extracts the pixel values from an image.
featurizeImagefeaturizeImage 使用预先定型的深度神经网络模型使图像特征化。Featurizes an image using a pre-trained deep neural network model.

3 - 评分和定型函数3-Scoring and training functions

函数名称Function name 说明Description
rxPredict.mlModelrxPredict.mlModel 使用存储过程从 SQL Server 运行评分库,或从支持实时评分的 R 代码运行评分库,从而提供更快的预测性能。Runs the scoring library either from SQL Server, using the stored procedure, or from R code enabling real-time scoring to provide much faster prediction performance.
rxFeaturizerxFeaturize 将数据从输入数据集转换为输出数据集。Transforms data from an input data set to an output data set.
mlModelmlModel 提供 Microsoft R 机器学习模型的摘要。Provides a summary of a Microsoft R Machine Learning model.

4 - 分类和回归的损失函数4-Loss functions for classification and regression

函数名称Function name 说明Description
expLossexpLoss 适用于指数分类损失函数的规范。Specifications for exponential classification loss function.
logLosslogLoss 适用于对数分类损失函数的规范。Specifications for log classification loss function.
hingeLosshingeLoss 适用于合页分类损失函数的规范。Specifications for hinge classification loss function.
smoothHingeLosssmoothHingeLoss 适用于平滑合页分类损失函数的规范。Specifications for smooth hinge classification loss function.
poissonLosspoissonLoss 适用于泊松回归损失函数的规范。Specifications for poisson regression loss function.
squaredLosssquaredLoss 适用于平方回归损失函数的规范。Specifications for squared regression loss function.

5 - 功能选择函数5-Feature selection functions

函数名称Function name 说明Description
minCountminCount 计数模式下的功能选择规范。Specification for feature selection in count mode.
mutualInformationmutualInformation 互信息模式下的功能选择规范。Specification for feature selection in mutual information mode.

6 - 集成建模函数6-Ensemble modeling functions

函数名称Function name 说明Description
fastTreesfastTrees 创建一个包含函数名称和参数的列表,以使用 rxEnsemble 定型 Fast Tree 模型。Creates a list containing the function name and arguments to train a Fast Tree model with rxEnsemble.
fastForestfastForest 创建一个包含函数名称和参数的列表,以使用 rxEnsemble 定型 Fast Forest 模型。Creates a list containing the function name and arguments to train a Fast Forest model with rxEnsemble.
fastLinearfastLinear 创建一个包含函数名称和参数的列表,以使用 rxEnsemble 定型 Fast Linear 模型。Creates a list containing the function name and arguments to train a Fast Linear model with rxEnsemble.
logisticRegressionlogisticRegression 创建一个包含函数名称和参数的列表,以使用 rxEnsemble 定型逻辑回归模型。Creates a list containing the function name and arguments to train a Logistic Regression model with rxEnsemble.
oneClassSvmoneClassSvm 创建一个包含函数名称和参数的列表,以使用 rxEnsemble 定型 OneClassSvm 模型。Creates a list containing the function name and arguments to train a OneClassSvm model with rxEnsemble.

7 - 神经网络函数7-Neural networking functions

函数名称Function name 说明Description
optimizeroptimizer 指定 rxNeuralNet 机器学习算法的优化算法。Specifies optimization algorithms for the rxNeuralNet machine learning algorithm.

8 - 包状态函数8-Package state functions

函数名称Function name 说明Description
rxHashEnvrxHashEnv 用于存储包范围的状态的环境对象。An environment object used to store package-wide state.

如何使用 MicrosoftMLHow to use MicrosoftML

MicrosoftML 中的函数可在封装在存储过程中的代码中调用。Functions in MicrosoftML are callable in R code encapsulated in stored procedures. 大多数开发者会在本地构建 MicrosoftML 解决方案,然后将已完成的 R 代码迁移到存储过程作为部署练习。Most developers build MicrosoftML solutions locally, and then migrate finished R code to stored procedures as a deployment exercise.

适用于 R 的 MicrosoftML 包在 SQL Server 2017 中安装为“开箱即用”。The MicrosoftML package for R is installed "out-of-the-box" in SQL Server 2017. 如果升级实例的 R 组件,它还可以与 SQL Server 2016 一起使用:使用绑定升级 SQL Server 的实例It is also available for use with SQL Server 2016 if you upgrade the R components for the instance: Upgrade an instance of SQL Server using binding

默认情况下不加载此包。The package is not loaded by default. 因此首先需加载 MicrosoftML 包,然后在需要使用远程计算上下文/相关连接或数据源对象时加载 RevoScaleR。As a first step, load the MicrosoftML package, and then load RevoScaleR if you need to use remote compute contexts or related connectivity or data source objects. 然后,引用所需的各个函数。Then, reference the individual functions you need.

library(microsoftml);
library(RevoScaleR);
logisticRegression(args);

另请参阅See also