revoscalepy(SQL Server 机器学习服务中的 Python 包)revoscalepy (Python package in SQL Server Machine Learning Services)

适用于:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) 及更高版本适用于:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) and later

revoscalepy 是 Microsoft 推出的 Python 包,支持分布式计算、远程计算上下文和高性能数据科学算法。revoscalepy is a Python package from Microsoft that supports distributed computing, remote compute contexts, and high-performance data science algorithms. 该包在 SQL Server 机器学习服务中提供。The package is included in SQL Server Machine Learning Services.

该包提供以下功能:The package offers the following functionality:

  • 具有相同版本 revoscalepy 的系统上的本地和远程计算上下文 Local and remote compute contexts on systems having the same version of revoscalepy
  • 数据转换和可视化功能Data transformation and visualization functions
  • 数据科学函数,可通过分布式或并行处理进行扩展Data science functions, scalable through distributed or parallel processing
  • 改进的性能,包括使用 Intel 数学库Improved performance, including use of the Intel math libraries

在 revoscalepy 中创建的数据源和计算上下文也可用于机器学习算法 。Data sources and compute contexts that you create in revoscalepy can also be used in machine learning algorithms. 有关这些算法的简介,请参阅 SQL Server 中的 microsoftml Python 模块For an introduction to these algorithms, see microsoftml Python module in SQL Server.

完整参考文档Full reference documentation

多个 Microsoft 产品中都分发有 revoscalepy 包,但不管是在 SQL Server 还是在其他产品中获取该包,用法都是一样的。The revoscalepy package is distributed in multiple Microsoft products, but usage is the same whether you get the package in SQL Server or another product. 由于函数相同,因此单个 revoscalepy 函数的文档仅发布到 Microsoft Machine Learning Server 的 Python 引用下的一个位置。Because the functions are the same, documentation for individual revoscalepy functions is published to just one location under the Python reference for Microsoft Machine Learning Server. 如果存在任何特定于产品的行为,这些差异将在函数帮助页中注明。Should any product-specific behaviors exist, discrepancies will be noted in the function help page.

版本和平台Versions and platforms

revoscalepy 模块基于 Python 3.5,且仅在安装以下 Microsoft 产品或下载之一时才可用 :The revoscalepy module is based on Python 3.5 and available only when you install one of the following Microsoft products or downloads:

备注

完整产品发布版本为 SQL Server 2017(仅限 Windows)。Full product release versions are Windows-only in SQL Server 2017. SQL Server 2019 及高更版本中的 revoscalepy 同时支持 Windows 和 Linux。Both Windows and Linux are supported for revoscalepy in SQL Server 2019 and later.

按类别列出函数Functions by category

本部分按类别列出函数,以帮助了解每个函数的使用方式。This section lists the functions by category to give you an idea of how each one is used. 此外,还可以使用目录按字母顺序查找函数。You can also use the table of contents to find functions in alphabetical order.

1 数据源和计算1-Data source and compute

revoscalepy 包含用于创建数据源和设置执行计算的位置或计算上下文的函数 。revoscalepy includes functions for creating data sources and setting the location, or compute context, of where computations are performed. 下表列出了与 SQL Server 方案相关的函数。Functions relevant to SQL Server scenarios are listed in the table below.

在某些情况下,SQL Server 和 Python 使用不同的数据类型。SQL Server and Python use different data types in some cases. 有关 SQL 和 Python 数据类型之间的映射的列表,请参阅 Python 到 SQL 数据类型For a list of mappings between SQL and Python data types, see Python-to-SQL data types.

函数Function 说明Description
RxInSqlServerRxInSqlServer 创建 SQL Server 计算上下文对象以将计算推送到远程实例。Create a SQL Server compute context object to push computations to a remote instance. 好几个 revoscalepy 函数都将计算上下文作为参数 。Several revoscalepy functions take compute context as an argument. 有关上下文切换示例,请参阅使用 revoscalepy 创建模型For a context-switch example, see Create a model using revoscalepy.
RxSqlServerDataRxSqlServerData 基于 SQL Server 查询或表创建数据对象。Create a data object based on a SQL Server query or table.
RxOdbcDataRxOdbcData 基于 ODBC 连接创建数据源。Create a data source based on an ODBC connection.
RxXdfDataRxXdfData 基于本地 XDF 文件创建数据源。Create a data source based on a local XDF file. XDF 文件通常用于将内存中数据卸载到磁盘。XDF files are often used to offload in-memory data to disk. 当处理的数据多于可以在一批中从数据库传输的数据时,或者数据多于内存中可以容纳的数据时,XDF 文件可能比较有用。An XDF file can be useful when working with more data than can be transferred from the database in one batch, or more data than can fit in memory. 例如,如果定期将大量数据从数据库移动到本地工作站,而非针对每个 R 操作重复查询数据库,则可以使用 XDF 文件作为缓存将数据保存在本地,然后在 R 工作区中使用它。For example, if you regularly move large amounts of data from a database to a local workstation, rather than query the database repeatedly for each R operation, you can use the XDF file as a kind of cache to save the data locally and then work with it in your R workspace.

提示

如果不熟悉数据源或计算上下文,建议从 Microsoft Machine Learning Server 文档中的分布式计算开始。If you are new to the idea of data sources or compute contexts, we recommend that you start with distributed computing in the Microsoft Machine Learning Server documentation.

2 数据操作 (ETL)2-Data manipulation (ETL)

函数Function 说明Description
rx_importrx_import 将数据导入 .xdf 文件或数据框。Import data into a .xdf file or data frame.
rx_data_steprx_data_step 将数据从输入数据集转换为输出数据集。Transform data from an input data set to an output data set.

3 训练和摘要3-Training and summarization

函数Function 说明Description
rx_btreesrx_btrees 调整随机梯度提升的决策树Fit stochastic gradient boosted decision trees
rx_dforestrx_dforest 调整分类和回归决策林Fit classification and regression decision forests
rx_dtreerx_dtree 调整分类和回归树Fit classification and regression trees
rx_lin_modrx_lin_mod 创建线性回归模型Create a linear regression model
rx_logitrx_logit 创建逻辑回归模型Create a logistic regression model
rx_summaryrx_summary 在 revoscalepy 中生成对象的单变量摘要。Produce univariate summaries of objects in revoscalepy.

还应查看 microsoftml 中的函数以了解其他方法。You should also review the functions in microsoftml for additional approaches.

4 评分函数4-Scoring functions

函数Function 说明Description
rx_predictrx_predict 从已训练的模型生成预测,并可用于实时评分。Generate predictions from a trained model and can be used for real-time scoring.
rx_predict_defaultrx_predict_default 使用 rx_lin_mod 和 rx_logit 对象计算预测值和残差。Compute predicted values and residuals using rx_lin_mod and rx_logit objects.
rx_predict_rx_dforestrx_predict_rx_dforest 计算 rx_dforest 或 rx_btrees 对象中的数据集的预测值或拟合值。Calculate predicted or fitted values for a data set from an rx_dforest or rx_btrees object.
rx_predict_rx_dtreerx_predict_rx_dtree 计算 rx_dtree 对象中数据集的预测值或拟合值。Calculate predicted or fitted values for a data set from an rx_dtree object.

如何使用 revoscalepyHow to work with revoscalepy

可在存储过程中封装的 Python 代码中调用 revoscalepy 中的函数 。Functions in revoscalepy are callable in Python code encapsulated in stored procedures. 大多数开发者会在本地构建 revoscalepy 解决方案,然后将已完成的 Python 代码迁移到存储过程作为部署练习 。Most developers build revoscalepy solutions locally, and then migrate finished Python code to stored procedures as a deployment exercise.

在本地运行时,通常可在命令行或 Python 开发环境中运行 Python 脚本,并使用 revoscalepy 函数之一指定 SQL Server 计算上下文 。When running locally, you typically run a Python script from the command line, or from a Python development environment, and specify a SQL Server compute context using one of the revoscalepy functions. 可将远程计算上下文用于整个代码或单个函数。You can use the remote compute context for the entire code, or for individual functions. 例如,你可能希望将模型定型卸载到服务器上以使用最新数据并避免数据移动。For example, you might want to offload model training to the server to use the latest data and avoid data movement.

准备好将 Python 脚本封装在存储过程 sp_execute_external_script 中时,建议将代码重写为具有明确定义的输入和输出的单个函数。When you are ready to encapsulate Python script inside a stored procedure, sp_execute_external_script, we recommend rewriting the code as a single function that has clearly defined inputs and outputs.

输入和输出必须为 pandas 数据帧 。Inputs and outputs must be pandas data frames. 完成此操作后,可从任何支持 T-SQL 的客户端调用存储过程,轻松地将 SQL 查询作为输入传递,并将结果保存到 SQL 表中。When this is done, you can call the stored procedure from any client that supports T-SQL, easily pass SQL queries as inputs, and save the results to SQL tables. 有关示例,请参阅了解面向 SQL 开发者的数据库内 Python 分析For an example, see Learn in-database Python analytics for SQL developers.

将 revoscalepy 与 microsoftml 配合使用Using revoscalepy with microsoftml

将用于 microsoftml 的 Python 函数与 revoscalepy 中提供的计算上下文和数据源集成在一起。The Python functions for microsoftml are integrated with the compute contexts and data sources that are provided in revoscalepy. 从 microsoftml 调用函数时,例如在定义和训练模型时,请使用 revoscalepy 函数在本地或在 SQL Server 远程计算上下文中执行 Python 代码。When calling functions from microsoftml, for example when defining and training a model, use the revoscalepy functions to execute the Python code either locally or in a SQL Server remote compute context.

以下示例显示了在 Python 代码中导入模块的语法。The following example shows the syntax for importing modules in your Python code. 然后可以引用所需的各个函数。You can then reference the individual functions you need.

from microsoftml.modules.logistic_regression.rx_logistic_regression import rx_logistic_regression
from revoscalepy.functions.RxSummary import rx_summary
from revoscalepy.etl.RxImport import rx_import_datasource

另请参阅See also