microsoftml(SQL Server 机器学习服务中的 Python 包)microsoftml (Python package in SQL Server Machine Learning Services)

适用于:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) 及更高版本适用于:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) and later

microsoftml 是 Microsoft 推出的 Python 包,可提供高性能的机器学习算法。microsoftml is a Python package from Microsoft that provides high-performance machine learning algorithms. 它包括用于定型和转换、评分、文本和图像分析的功能,以及用于从现有数据中派生值的特征提取功能。It includes functions for training and transformations, scoring, text and image analysis, and feature extraction for deriving values from existing data. 该包在 SQL Server 机器学习服务中提供,支持高性能的大数据、多核处理及快速数据流式处理。The package is included in SQL Server Machine Learning Services and supports high performance on big data, using multicore processing, and fast data streaming.

完整参考文档Full reference documentation

多个 Microsoft 产品中都分发有 microsoftml 包,但不管是在 SQL Server 还是在其他产品中获取该包,用法都是一样的。The microsoftml package is distributed in multiple Microsoft products, but usage is the same whether you get the package in SQL Server or another product. 由于函数相同,因此单个 microsoftml 函数的文档仅发布到 Microsoft Machine Learning Server 的 引用下的一个位置。Because the functions are the same, documentation for individual microsoftml functions is published to just one location under the Python reference for Microsoft Machine Learning Server. 如果存在任何特定于产品的行为,这些差异将在函数帮助页中注明。Should any product-specific behaviors exist, discrepancies will be noted in the function help page.

版本和平台Versions and platforms

microsoftml 模块基于 Python 3.5,且仅在安装以下 Microsoft 产品或下载之一时才可用 :The microsoftml module is based on Python 3.5 and available only when you install one of the following Microsoft products or downloads:


完整产品发布版本为 SQL Server 2017(仅限 Windows)。Full product release versions are Windows-only in SQL Server 2017. SQL Server 2019 中的 microsoftml 同时支持 Windows 和 Linux 。Both Windows and Linux are supported for microsoftml in SQL Server 2019.

包依赖项Package dependencies

microsoftml 中的算法依赖于以下内容的 revoscalepyAlgorithms in microsoftml depend on revoscalepy for:

  • 数据源对象。Data source objects. microsoftml 函数使用的数据是使用 revoscalepy 函数创建的 。Data consumed by microsoftml functions are created using revoscalepy functions.
  • 远程计算(将函数执行转移到远程 SQL Server 实例)。Remote computing (shifting function execution to a remote SQL Server instance). revoscalepy 包提供用于创建和激活 SQL Server 远程计算上下文的函数。The revoscalepy package provides functions for creating and activating a remote compute context for SQL server.

在大多数情况下,在使用 microsoftml 时,需要一起加载包 。In most cases, you will load the packages together whenever you are using microsoftml.

按类别列出函数Functions by category

本部分按类别列出函数,以帮助了解每个函数的使用方式。This section lists the functions by category to give you an idea of how each one is used. 此外,还可以使用目录按字母顺序查找函数。You can also use the table of contents to find functions in alphabetical order.

1 训练函数1-Training functions

函数Function 描述Description
microsoftml.rx_ensemblemicrosoftml.rx_ensemble 定型模型的系综。Train an ensemble of models.
microsoftml.rx_fast_forestmicrosoftml.rx_fast_forest 随机林。Random Forest.
microsoftml.rx_fast_linearmicrosoftml.rx_fast_linear 线性模型。Linear Model. 以及随机双坐标上升。with Stochastic Dual Coordinate Ascent.
microsoftml.rx_fast_treesmicrosoftml.rx_fast_trees 提升树。Boosted Trees.
microsoftml.rx_logistic_regressionmicrosoftml.rx_logistic_regression 逻辑回归。Logistic Regression.
microsoftml.rx_neural_networkmicrosoftml.rx_neural_network 神经网络。Neural Network.
microsoftml.rx_oneclass_svmmicrosoftml.rx_oneclass_svm 异常检测。Anomaly Detection.

2 转换函数2-Transform functions

分类变量处理Categorical variable handling

函数Function 描述Description
microsoftml.categoricalmicrosoftml.categorical 将文本列转换为类别。Converts a text column into categories.
microsoftml.categorical_hashmicrosoftml.categorical_hash 将文本列进行哈希处理并转换为类别。Hashes and converts a text column into categories.

架构操作Schema manipulation

函数Function 描述Description
microsoftml.concatmicrosoftml.concat 将多个列串联为一个矢量。Concatenates multiple columns into a single vector.
microsoftml.drop_columnsmicrosoftml.drop_columns 从数据集中删除列。Drops columns from a dataset.
microsoftml.select_columnsmicrosoftml.select_columns 保留数据集的列。Retains columns of a dataset.

变量选择Variable selection

函数Function 描述Description
microsoftml.count_selectmicrosoftml.count_select 基于计数的功能选择。Feature selection based on counts.
microsoftml.mutualinformation_selectmicrosoftml.mutualinformation_select 基于互信息的功能选择。Feature selection based on mutual information.

文本分析Text analytics

函数Function 描述Description
microsoftml.featurize_textmicrosoftml.featurize_text 将文本列转换为数字特征。Converts text columns into numerical features.
microsoftml.get_sentimentmicrosoftml.get_sentiment 情绪分析。Sentiment analysis.

图像分析Image analytics

函数Function 描述Description
microsoftml.load_imagemicrosoftml.load_image 加载图像。Loads an image.
microsoftml.resize_imagemicrosoftml.resize_image 调整图像大小。Resizes an Image.
microsoftml.extract_pixelsmicrosoftml.extract_pixels 从图像中提取像素。Extracts pixels from an image.
microsoftml.featurize_imagemicrosoftml.featurize_image 将图像转换为特征。Converts an image into features.

特征化函数Featurization functions

函数Function 描述Description
microsoftml.rx_featurizemicrosoftml.rx_featurize 数据源的数据转换Data transformation for data sources

3 评分函数3-Scoring functions

函数Function 说明Description
microsoftml.rx_predictmicrosoftml.rx_predict 使用 Microsoft 机器学习模型评分Scores using a Microsoft machine learning model

如何调用 microsoftmlHow to call microsoftml

可在存储过程中封装的 Python 代码中调用 microsoftml 中的函数 。Functions in microsoftml are callable in Python code encapsulated in stored procedures. 大多数开发者会在本地构建 microsoftml 解决方案,然后将已完成的 Python 代码迁移到存储过程作为部署练习 。Most developers build microsoftml solutions locally, and then migrate finished Python code to stored procedures as a deployment exercise.

默认情况下,将安装 Python 的 microsoftml 包,但与 revoscalepy 不同,使用随 SQL Server 安装的 Python 可执行文件启动 Python 会话时,默认不加载该包 。The microsoftml package for Python is installed by default, but unlike revoscalepy, it is not loaded by default when you start a Python session using the Python executables installed with SQL Server.

首先,导入 microsoftml 包,然后在需要使用远程计算上下文或相关连接或数据源对象时导入 revoscalepy 。As a first step, import the microsoftml package, and import revoscalepy if you need to use remote compute contexts or related connectivity or data source objects. 然后,引用所需的各个函数。Then, reference the individual functions you need.

from microsoftml.modules.logistic_regression.rx_logistic_regression import rx_logistic_regression
from revoscalepy.functions.RxSummary import rx_summary
from revoscalepy.etl.RxImport import rx_import_datasource

另请参阅See also