快速入门:通过 SQL 机器学习运行简单的 Python 脚本Quickstart: Run simple Python scripts with SQL machine learning

适用于:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) 及更高版本 是Azure SQL 托管实例Azure SQL Managed InstanceYesAzure SQL 托管实例Azure SQL Managed Instance适用于:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) and later 是Azure SQL 托管实例Azure SQL Managed InstanceYesAzure SQL 托管实例Azure SQL Managed Instance

在本快速入门中,你将使用 SQL Server 机器学习服务Azure SQL 托管实例机器学习服务SQL Server 大数据群集运行一组简单的 Python 脚本。In this quickstart, you'll run a set of simple Python scripts using SQL Server Machine Learning Services, Azure SQL Managed Instance Machine Learning Services, or SQL Server Big Data Clusters. 你将了解如何在 SQL Server 实例中使用存储过程 sp_execute_external_script 执行该脚本。You'll learn how to use the stored procedure sp_execute_external_script to execute the script in a SQL Server instance.

先决条件Prerequisites

若要运行本快速入门,需要具备以下先决条件。You need the following prerequisites to run this quickstart.

运行简单脚本Run a simple script

若要运行 Python 脚本,请将它作为参数传递给系统存储过程 sp_execute_external_scriptTo run a Python script, you'll pass it as an argument to the system stored procedure, sp_execute_external_script. 此系统存储过程在 SQL 机器学习的上下文中启动 Python 运行时,将数据传递到 Python,安全地管理 Python 用户会话,并将所有结果返回到客户端。This system stored procedure starts the Python runtime in the context of SQL machine learning, passes data to Python, manages Python user sessions securely, and returns any results to the client.

在下面的步骤中,你将在数据库中运行此示例 Python 脚本:In the following steps, you'll run this example Python script in your database:

a = 1
b = 2
c = a/b
d = a*b
print(c, d)
  1. 在连接到 SQL 实例的“Azure Data Studio”中打开一个新的查询窗口。Open a new query window in Azure Data Studio connected to your SQL instance.

  2. 将完整的 Python 脚本传递到 sp_execute_external_script 存储过程。Pass the complete Python script to the sp_execute_external_script stored procedure.

    通过 @script 参数传递脚本。The script is passed through the @script argument. @script 参数内的所有内容都必须是有效的 Python 代码。Everything inside the @script argument must be valid Python code.

    EXECUTE sp_execute_external_script @language = N'Python'
        , @script = N'
    a = 1
    b = 2
    c = a/b
    d = a*b
    print(c, d)
    '
    
  3. 计算出正确的结果,Python print 函数将结果返回到“消息”窗口。The correct result is calculated and the Python print function returns the result to the Messages window.

    结果应该如下所示。It should look something like this.

    结果Results

    STDOUT message(s) from external script:
    0.5 2
    

运行 Hello World 脚本Run a Hello World script

典型的示例脚本只输出字符串“Hello World”。A typical example script is one that just outputs the string "Hello World". 运行以下命令。Run the following command.

EXECUTE sp_execute_external_script @language = N'Python'
    , @script = N'OutputDataSet = InputDataSet'
    , @input_data_1 = N'SELECT 1 AS hello'
WITH RESULT SETS(([Hello World] INT));
GO

sp_execute_external_script 存储过程的输入包括:Inputs to the sp_execute_external_script stored procedure include:

输入Input 说明Description
@language 定义本例中要调用 Python 的语言扩展defines the language extension to call, in this case Python
@script 定义传递给 Python 运行时的命令。defines the commands passed to the Python runtime. 必须以 Unicode 文本形式将整个 Python 脚本封装在此参数中。Your entire Python script must be enclosed in this argument, as Unicode text. 还可将文本添加到 nvarchar 类型的变量并调用该变量You could also add the text to a variable of type nvarchar and then call the variable
@input_data_1 查询返回的数据将传递给 Python 运行时,后者将数据以数据帧的形式返回data returned by the query, passed to the Python runtime, which returns the data as a data frame
WITH RESULT SETSWITH RESULT SETS 子句为 SQL 机器学习定义返回的数据表的架构,将“Hello World”添加为列名称,并为数据类型添加“int”clause defines the schema of the returned data table for SQL machine learning, adding "Hello World" as the column name, int for the data type

命令输出以下文本:The command outputs the following text:

Hello WorldHello World
11

使用输入和输出Use inputs and outputs

默认情况下,sp_execute_external_script 接受单个数据集作为输入,通常以有效的 SQL 查询的形式提供。By default, sp_execute_external_script accepts a single dataset as input, which typically you supply in the form of a valid SQL query. 然后,它返回单个 Python 数据帧作为输出。It then returns a single Python data frame as output.

现在,使用 sp_execute_external_script 的默认输入和输出变量:InputDataSet 和 OutputDataSet 。For now, let's use the default input and output variables of sp_execute_external_script: InputDataSet and OutputDataSet.

  1. 创建一个小型测试数据表。Create a small table of test data.

    CREATE TABLE PythonTestData (col1 INT NOT NULL)
    
    INSERT INTO PythonTestData
    VALUES (1);
    
    INSERT INTO PythonTestData
    VALUES (10);
    
    INSERT INTO PythonTestData
    VALUES (100);
    GO
    
  2. 使用 SELECT 语句来查询表。Use the SELECT statement to query the table.

    SELECT *
    FROM PythonTestData
    

    结果Results

    PythonTestData 表的内容

  3. 运行以下 Python 脚本。Run the following Python script. 它使用 SELECT 语句从表中检索数据,通过 Python 运行时传递数据,并以数据帧的形式返回数据。It retrieves the data from the table using the SELECT statement, passes it through the Python runtime, and returns the data as a data frame. WITH RESULT SETS 子句为 SQL 定义返回的数据表的架构,并添加了列名称“NewColName”。The WITH RESULT SETS clause defines the schema of the returned data table for SQL, adding the column name NewColName.

    EXECUTE sp_execute_external_script @language = N'Python'
        , @script = N'OutputDataSet = InputDataSet;'
        , @input_data_1 = N'SELECT * FROM PythonTestData;'
    WITH RESULT SETS(([NewColName] INT NOT NULL));
    

    结果Results

    从表返回数据的 Python 脚本的输出

  4. 现在,更改输入变量和输出变量的名称。Now change the names of the input and output variables. 默认的输入和输出变量名称是 InputDataSet 和 OutputDataSet,而以下脚本会将名称更改为 SQL_in 和 SQL_out :The default input and output variable names are InputDataSet and OutputDataSet, the following script changes the names to SQL_in and SQL_out:

    EXECUTE sp_execute_external_script @language = N'Python'
        , @script = N'SQL_out = SQL_in;'
        , @input_data_1 = N'SELECT 12 as Col;'
        , @input_data_1_name  = N'SQL_in'
        , @output_data_1_name = N'SQL_out'
    WITH RESULT SETS(([NewColName] INT NOT NULL));
    

    请注意 Python 区分大小写。Note that Python is case-sensitive. Python 脚本中使用的输入和输出变量(SQL_out、SQL_in)需要匹配使用 @input_data_1_name@output_data_1_name 定义的名称,包括大小写 。The input and output variables used in the Python script (SQL_out, SQL_in) need to match the names defined with @input_data_1_name and @output_data_1_name, including case.

    提示

    只能将一个输入数据集作为参数传递,并且只能返回一个数据集。Only one input dataset can be passed as a parameter, and you can return only one dataset. 但是,可以从 Python 代码内调用其他数据集,并且除数据集以外,还可以返回其他类型的输出。However, you can call other datasets from inside your Python code and you can return outputs of other types in addition to the dataset. 也可向任何参数添加 OUTPUT 关键字,让该参数随结果一起返回。You can also add the OUTPUT keyword to any parameter to have it returned with the results.

  5. 还可以仅使用没有输入数据的 Python 脚本(@input_data_1 设置为空白)生成值。You can also generate values just using the Python script with no input data (@input_data_1 is set to blank).

    以下脚本输出文本“hello”和“world”。The following script outputs the text "hello" and "world".

    EXECUTE sp_execute_external_script @language = N'Python'
        , @script = N'
    import pandas as pd
    mytextvariable = pandas.Series(["hello", " ", "world"]);
    OutputDataSet = pd.DataFrame(mytextvariable);
    '
        , @input_data_1 = N''
    WITH RESULT SETS(([Col1] CHAR(20) NOT NULL));
    

    结果Results

    使用 @script 作为输入的查询结果

备注

Python 使用前导空格对语句进行分组。Python uses leading spaces to group statements. 因此,当嵌入的 Python 脚本跨越多行时(如前面的脚本中所述),请勿尝试将 Python 命令缩进到与 SQL 命令保持一致。So when the imbedded Python script spans multiple lines, as in the preceding script, don't try to indent the Python commands to be in line with the SQL commands. 例如,此脚本将生成错误:For example, this script will produce an error:

EXECUTE sp_execute_external_script @language = N'Python'
      , @script = N'
      import pandas as pd
      mytextvariable = pandas.Series(["hello", " ", "world"]);
      OutputDataSet = pd.DataFrame(mytextvariable);
      '
      , @input_data_1 = N''
WITH RESULT SETS(([Col1] CHAR(20) NOT NULL));

检查 Python 版本Check Python version

若要查看服务器中安装的 Python 版本,请运行以下脚本。If you would like to see which version of Python is installed in your server, run the following script.

EXECUTE sp_execute_external_script @language = N'Python'
    , @script = N'
import sys
print(sys.version)
'
GO

Python print 函数将该版本返回到“消息”窗口。The Python print function returns the version to the Messages window. 在下面的示例输出中,可以看到本例中安装的是 Python 版本 3.5.2。In the example output below, you can see that in this case, Python version 3.5.2 is installed.

结果Results

STDOUT message(s) from external script:
3.5.2 |Continuum Analytics, Inc.| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

列出 Python 包List Python packages

Microsoft 提供了许多随机器学习服务预安装的 Python 包。Microsoft provides a number of Python packages pre-installed with Machine Learning Services.

若要查看安装的 Python 包(包括版本)列表,请运行以下脚本。To see a list of which Python packages are installed, including version, run the following script.

EXECUTE sp_execute_external_script @language = N'Python'
    , @script = N'
import pkg_resources
import pandas
dists = [str(d) for d in pkg_resources.working_set]
OutputDataSet = pandas.DataFrame(dists)
'
WITH RESULT SETS(([Package] NVARCHAR(max)))
GO

此列表来自 Python 中的 pkg_resources.working_set,并作为数据帧返回到 SQL。The list is from pkg_resources.working_set in Python and returned to SQL as a data frame.

结果Results

已安装的 Python 包的列表

后续步骤Next steps

若要了解在 SQL 机器学习中使用 Python 时如何使用数据结构,请按照此快速入门操作:To learn how to use data structures when using Python in SQL machine learning, follow this quickstart: