SQL Server 机器学习服务中的 Python 语言扩展Python language extension in SQL Server Machine Learning Services

适用于:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) 及更高版本适用于:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) and later

本文介绍用于运行外部 Python 脚本(使用 SQL Server 机器学习服务)的 Python 扩展。This article describes the Python extension for running external Python scripts with SQL Server Machine Learning Services. 扩展添加:The extension adds:

  • Python 执行环境A Python execution environment
  • 包含 Python 3.5 运行时和解释器的 Anaconda 分发Anaconda distribution with the Python 3.5 runtime and interpreter
  • 标准库和工具Standard libraries and tools
  • Microsoft Python 包:Microsoft Python packages:

安装 Python 3.5 运行时和解释器可确保与标准 Python 解决方案近乎完全兼容。Installation of the Python 3.5 runtime and interpreter ensures near-complete compatibility with standard Python solutions. Python 与 SQL Server 在不同的进程中运行,以保证数据库操作不受到影响。Python runs in a separate process from SQL Server, to guarantee that database operations are not compromised.

Python 组件Python components

SQL Server 包含开源包和专有包。SQL Server includes both open-source and proprietary packages. 安装程序安装的 Python 运行时为带 Python 3.5 的 Anaconda 4.2。The Python runtime installed by Setup is Anaconda 4.2 with Python 3.5. Python 运行时独立于 SQL 工具安装,并在扩展性框架中的核心引擎进程外部执行。The Python runtime is installed independently of SQL tools, and is executed outside of core engine processes, in the extensibility framework. 在使用 Python 安装机器学习服务的过程中,必须同意 GNU 公共许可证的条款。As part of the installation of Machine Learning Services with Python, you must consent to the terms of the GNU Public License.

SQL Server 不会修改基本 Python 可执行文件,但必须使用安装程序安装的 Python 版本,因为该版本是用于生成和测试专有包的版本。SQL Server does not modify the Python executables, but you must use the version of Python installed by Setup because that version is the one that the proprietary packages are built and tested on. 有关 Anaconda 发行版支持的包的列表,请参阅 Continuum 分析网站:Anaconda 包列表For a list of packages supported by the Anaconda distribution, see the Continuum analytics site: Anaconda package list.

与特定数据库引擎实例关联的 Anaconda 发行版可在与实例关联的文件夹中找到。The Anaconda distribution associated with a specific database engine instance can be found in the folder associated with the instance. 例如,如果在默认实例上安装了带机器学习服务和 Python 的 SQL Server 2017 数据库引擎,请查看 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICESFor example, if you installed SQL Server 2017 database engine with Machine Learning Services and Python on the default instance, look under C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES.

Microsoft 为并行和分布式工作负荷添加的 Python 包包含以下库。Python packages added by Microsoft for parallel and distributed workloads include the following libraries.

Library 说明Description
revoscalepyrevoscalepy 支持数据源对象以及数据浏览、操作、转换和可视化。Supports data source objects and data exploration, manipulation, transformation, and visualization. 它支持创建远程计算上下文以及各种可缩放的机器学习模型,例如 rxLinModIt supports creation of remote compute contexts, as well as a various scalable machine learning models, such as rxLinMod. 有关详细信息,请参阅 revoscalepy 模块与 SQL ServerFor more information, see revoscalepy module with SQL Server.
microsoftmlmicrosoftml 包含针对速度和准确性进行了优化的机器学习算法,以及用于处理文本和图像的内联转换。Contains machine learning algorithms that have been optimized for speed and accuracy, as well as in-line transformations for working with text and images. 有关详细信息,请参阅 microsoftml 模块与 SQL ServerFor more information, see microsoftml module with SQL Server.

Microsoftml 和 revoscalepy 紧密耦合;microsoftml 中使用的数据源定义为 revoscalepy 对象。Microsoftml and revoscalepy are tightly coupled; data sources used in microsoftml are defined as revoscalepy objects. Revoscalepy 中的计算上下文限制传递到 microsoftml。Compute context limitations in revoscalepy transfer to microsoftml. 即,所有功能都可用于本地操作,但切换到远程计算上下文需要 RxInSqlServer。Namely, all functionality is available for local operations, but switching to a remote compute context requires RxInSqlServer.

在 SQL Server 中使用 PythonUsing Python in SQL Server

revoscalepy 模块导入 Python 代码中,然后从该模块调用函数,就像调用任何其他 Python 函数一样。You import the revoscalepy module into your Python code, and then call functions from the module, like any other Python functions.

支持的数据源包括 ODBC 数据库、SQL Server 和用于与其他源或通过 R 解决方案交换数据的 XDF 文件格式。Supported data sources include ODBC databases, SQL Server, and XDF file format to exchange data with other sources, or with R solutions. Python 的输入数据必须为表格格式。Input data for Python must be tabular. 所有 Python 结果都必须以 pandas 数据帧的形式返回。All Python results must be returned in the form of a pandas data frame.

支持的计算上下文包括本地或远程 SQL Server 计算上下文。Supported compute contexts include local, or remote SQL Server compute context. 远程计算上下文指在一台计算机(例如工作站)上开始,但随后将脚本执行切换到远程计算机的代码执行。A remote compute context refers to code execution that starts on one computer such as a workstation, but then switches script execution to a remote computer. 切换计算上下文要求两个系统具有同一 revoscalepy 库。Switching the compute context requires that both systems have the same revoscalepy library.

正如用户所预期的一样,本地计算上下文包括在与数据库引擎实例相同的服务器上执行 Python 代码,以及 T-SQL 内部或嵌入存储过程的代码。Local compute context, as you might expect, includes execution of Python code on the same server as the database engine instance, with code inside T-SQL or embedded in a stored procedure. 还可以通过定义远程计算上下文,从本地 Python IDE 运行代码并使脚本在 SQL Server 计算机上执行。You can also run the code from a local Python IDE and have the script execute on the SQL Server computer, by defining a remote compute context.

执行体系结构Execution architecture

以下关系图说明 SQL Server 组件与 Python 运行时在每种支持的方案中的交互:在数据库内运行脚本,以及使用 SQL Server 计算上下文从 Python 终端远程执行。The following diagrams depict the interaction of SQL Server components with the Python runtime in each of the supported scenarios: running script in-database, and remote execution from a Python terminal, using a SQL Server compute context.

在数据库内执行 Python 脚本Python scripts executed in-database

在 SQL Server“内部”运行 Python 时,必须将 Python 脚本封装到特殊的存储过程 sp_execute_external_script 中。When you run Python "inside" SQL Server, you must encapsulate the Python script inside a special stored procedure, sp_execute_external_script.

在将脚本嵌入存储过程后,可以进行存储过程调用的任何应用程序都可以启动 Python 代码的执行。After the script has been embedded in the stored procedure, any application that can make a stored procedure call can initiate execution of the Python code. 然后,SQL Server 按下图中概括的方式管理代码执行。Thereafter SQL Server manages code execution as summarized in the following diagram.

script-in-db-python

  1. 对 Python 运行时发出的请求由传递给存储过程的参数 @language='Python' 指示。A request for the Python runtime is indicated by the parameter @language='Python' passed to the stored procedure. SQL Server 将此请求发送到 Launchpad 服务。SQL Server sends this request to the launchpad service. 在 Linux 中,SQL 使用 Launchpad 服务与每个用户的独立 Launchpad 进程进行通信。In Linux, SQL uses a launchpadd service to communicate with a separate launchpad process for each user. 有关详细信息,请参阅扩展性体系结构关系图See the Extensibility architecture diagram for details.
  2. Launchpad 服务启动相应的启动器(在本例中为 PythonLauncher)。The launchpad service starts the appropriate launcher; in this case, PythonLauncher.
  3. PythonLauncher 启动外部 Python35 进程。PythonLauncher starts the external Python35 process.
  4. BxlServer 与 Python 运行时协同工作,管理数据交换并存储工作结果。BxlServer coordinates with the Python runtime to manage exchanges of data, and storage of working results.
  5. SQL Satellite 管理相关任务和进程与 SQL Server 之间的通信。SQL Satellite manages communications about related tasks and processes with SQL Server.
  6. BxlServer 使用 SQL Satellite 向 SQL Server 传递状态和结果。BxlServer uses SQL Satellite to communicate status and results to SQL Server.
  7. SQL Server 获取结果并关闭相关任务和进程。SQL Server gets results and closes related tasks and processes.

从远程客户端执行 Python 脚本Python scripts executed from a remote client

如果满足以下条件,则可以从远程计算机(例如笔记本电脑)运行 Python 脚本,并使它们在 SQl Server 计算机的上下文中执行:You can run Python scripts from a remote computer, such as a laptop, and have them execute in the context of the SQl Server computer, if these conditions are met:

  • 可以适当地设计脚本You design the scripts appropriately
  • 远程计算机已安装了机器学习服务使用的扩展性库。The remote computer has installed the extensibility libraries that are used by Machine Learning Services. 使用远程计算上下文需要 revoscalepy 包。The revoscalepy package is required to use remote compute contexts.

以下关系图概括了从远程计算机发送脚本时的总体工作流。The following diagram summarizes the overall workflow when scripts are sent from a remote computer.

remote-sqlcc-from-python

  1. 对于 revoscalepy 支持的函数,Python 运行时调用链接函数,后者接着调用 BxlServer。For functions that are supported in revoscalepy, the Python runtime calls a linking function, which in turn calls BxlServer.
  2. BxlServer 包含在机器学习服务(数据库内)中,与 Python 运行时在不同的进程中运行。BxlServer is included with Machine Learning Services (In-Database) and runs in a separate process from the Python runtime.
  3. BxlServer 确定连接目标并使用 ODBC 发起连接,传递 Python 脚本的连接字符串中提供的凭据。BxlServer determines the connection target and initiates a connection using ODBC, passing credentials supplied as part of the connection string in the Python script.
  4. BxlServer 打开与 SQL Server 实例的连接。BxlServer opens a connection to the SQL Server instance.
  5. 如果调用外部脚本运行时,将调用 Launchpad 服务,这将启动相应的启动器:在本例中为 PythonLauncher.dll。When an external script runtime is called, the launchpad service is invoked, which in turn starts the appropriate launcher: in this case, PythonLauncher.dll. 此后,处理 Python 代码的工作流与在 T-SQL 中从存储过程调用 Python 代码时工作流类似。Thereafter, processing of Python code is handled in a workflow similar to that when Python code is invoked from a stored procedure in T-SQL.
  6. PythonLauncher 调用 SQL Server 计算机上安装的 Python 实例。PythonLauncher makes a call to the instance of the Python that is installed on the SQL Server computer.
  7. 将结果返回到 BxlServer。Results are returned to BxlServer.
  8. SQL Satellite 管理与 SQL Server 之间的通信并清理相关作业对象。SQL Satellite manages communication with SQL Server and cleanup of related job objects.
  9. SQL Server 将结果传回客户端。SQL Server passes results back to the client.

后续步骤Next steps