在 SQL Server 中使用 sp_rxPredict 进行实时评分Real-time scoring with sp_rxPredict in SQL Server

适用于:Applies to: 是SQL Server 2016 (13.x)SQL Server 2016 (13.x)yesSQL Server 2016 (13.x)SQL Server 2016 (13.x) 及更高版本适用于:Applies to: 是SQL Server 2016 (13.x)SQL Server 2016 (13.x)yesSQL Server 2016 (13.x)SQL Server 2016 (13.x) and later

了解如何在 SQL Server 中使用 sp_rxPredict 系统存储过程执行实时评分,从而在预测工作负载时实现高性能的预测或评分。Learn how to perform real-time scoring with the sp_rxPredict system stored procedure in SQL Server for high-performance predictions or scores in forecasting workloads.

sp_rxPredict 的实时评分与语言无关,执行时不依赖于机器学习服务机器学习服务器中的 R 或 Python 运行时。Real-time scoring with sp_rxPredict is language-agnostic and executes with no dependencies on the R or Python runtimes in Machine Learning Services or Machine Learning Server. 在 SQL Server 中使用通过 Microsoft 函数创建和定型的一个模型,将它序列化为二进制格式,则可以使用实时评分在没有安装 R 或 Python 加载项的 SQL Server 实例上生成新数据输入的预测结果。Using a model created and trained using Microsoft functions and serialized to a binary format in SQL Server, you can use real-time scoring to generate predicted outcomes on new data inputs on SQL Server instances that do not have the R or Python add-on installed.

实时评分的工作原理How real-time scoring works

支持对基于 R 中的 RevoScaleRMicrosoftML,或 Python 中的 revoscalepymicrosoftml 中的函数的特定模型类型进行实时评分。Real-time scoring is supported on specific model types based on functions in RevoScaleR or MicrosoftML in R, or revoscalepy or microsoftml in Python. 它根据提供给存储为特殊二进制格式的机器学习模型的用户输入来使用本机 C++ 库生成评分。It uses native C++ libraries to generate scores based on user input provided to a machine learning model stored in a special binary format.

因为无需调用机器学习服务机器学习服务器中的外部语言运行时即可将已训练的模型用于评分,所以减少了多个进程的开销。Because a trained model can be used for scoring without having to call an external language runtime in Machine Learning Services or Machine Learning Server, the overhead of multiple processes is reduced.

实时评分是一个多步骤流程:Real-time scoring is a multi-step process:

  1. 基于每个数据库启用执行评分的存储过程。You enable the stored procedure that does scoring on a per-database basis.
  2. 以二进制格式加载已预定型的模型。You load the pre-trained model in binary format.
  3. 提供要评分的新输入数据(表格或各行)作为模型的输入。You provide new input data to be scored, either tabular or single rows, as input to the model.
  4. 若要生成分数,请调用 sp_rxPredict 存储过程。To generate scores, call the sp_rxPredict stored procedure.

先决条件Prerequisites

  • 启用 SQL Server CLR 集成Enable SQL Server CLR integration.

  • 启用实时评分Enable real-time scoring.

  • 必须使用下面列出的某个受支持的 rx 算法预定型模型 。The model must be trained in advance using one of the supported rx algorithms. 有关详细信息,请参阅 sp_rxPredict支持的算法For details, see Supported algorithms for sp_rxPredict.

  • 使用适用于 R 的 rxSerialize 或适用于 Python 的 rx_serialize_model 对模型进行序列化。Serialize the model using rxSerialize for R or rx_serialize_model for Python. 已对这些序列化函数进行了优化,以支持快速评分。These serialization functions have been optimized to support fast scoring.

  • 将模型保存到要从中调用该模型的数据库引擎实例。Save the model to the database engine instance from which you want to call it. 此实例不需要具有 R 或 Python 运行时扩展。This instance is not required to have the R or Python runtime extension.

备注

实时评分目前已经过优化,可快速预测较小的数据集,从几行到数十万行的数据不等。Real-time scoring is currently optimized for fast predictions on smaller data sets, ranging from a few rows to hundreds of thousands of rows. 对于大数据集,使用 rxPredict 进行预测的速度可能会更快。On big datasets, using rxPredict might be faster.

启用实时评分Enable real-time scoring

对要用于评分的每个数据库启用此功能。Enable this feature for each database that you want to use for scoring. 服务器管理员应运行包含在 RevoScaleR 包中的命令行实用工具 RegisterRExt.exe。The server administrator should run the command-line utility, RegisterRExt.exe, which is included with the RevoScaleR package.

注意

若要进行实时评分,需要在实例中启用 SQL CLR 功能,并且需要将数据库标记为可信。In order for real-time scoring to work, SQL CLR functionality needs to be enabled in the instance and the database needs to be marked trustworthy. 运行脚本时,系统将为你执行这些操作。When you run the script, these actions are performed for you. 但是,在此之前,请仔细考虑其他的安全隐患!However, consider carefully the additional security implications before doing this.

  1. 打开提升的命令提示符,并导航到 RegisterRExt.exe 所在的文件夹。Open an elevated command prompt, and navigate to the folder where RegisterRExt.exe is located. 在默认安装中可以使用以下路径:The following path can be used in a default installation:

    <SQLInstancePath>\R_SERVICES\library\RevoScaleR\rxLibs\x64\

  2. 运行以下命令,替换要从中启用扩展存储过程的实例和目标数据库的名称:Run the following command, substituting the name of your instance and the target database where you want to enable the extended stored procedures:

    RegisterRExt.exe /installRts [/instance:name] /database:databasename

    例如,若要将扩展存储过程添加到默认实例上的 CLRPredict 数据库中,请键入:For example, to add the extended stored procedure to the CLRPredict database on the default instance, type:

    RegisterRExt.exe /installRts /database:CLRPRedict

    如果数据库在默认实例上,实例名称则是可选的。The instance name is optional if the database is on the default instance. 如果使用的是命名实例,请指定实例名称。If you're using a named instance, specify the instance name.

  3. RegisterRExt.exe 创建以下对象:RegisterRExt.exe creates the following objects:

    • 受信任的程序集Trusted assemblies
    • 存储过程 sp_rxPredictThe stored procedure sp_rxPredict
    • 新数据库角色 rxpredict_usersA new database role, rxpredict_users. 数据库管理员可以使用此角色向使用实时评分功能的用户授予权限。The database administrator can use this role to grant permission to users who use the real-time scoring functionality.
  4. 将需要运行 sp_rxPredict 的任何用户添加到新角色中。Add any users who need to run sp_rxPredict to the new role.

备注

SQL Server 2017 及更高版本中提供了其他安全措施来防止出现 CLR 集成问题。In SQL Server 2017 and later, additional security measures are in place to prevent problems with CLR integration. 这些措施还对此存储过程的使用施加了其他限制。These measures impose additional restrictions on the use of this stored procedure as well.

禁用实时评分Disable real-time scoring

若要禁用实时评分功能,请打开提升的命令提示符,并运行以下命令:RegisterRExt.exe /uninstallrts /database:<database_name> [/instance:name]To disable real-time scoring functionality, open an elevated command prompt, and run the following command: RegisterRExt.exe /uninstallrts /database:<database_name> [/instance:name]

示例Example

本示例介绍准备和保存实时预测模型所需的步骤,并提供有关如何使用 R 从 T-SQL 调用函数的示例。This example describes the steps required to prepare and save a model for real-time prediction, and provides an example in R of how to call the function from T-SQL.

步骤 1。Step 1. 准备并保存模型Prepare and save the model

sp_rxPredict 所需的二进制格式与使用 PREDICT 函数所需的格式相同。The binary format required by sp_rxPredict is the same as the format required to use the PREDICT function. 因此,将对 rxSerializeModel 的一个调用包含在你的 R 代码中,并确保指定 realtimeScoringOnly = TRUE,如本示例所示:Therefore, in your R code, include a call to rxSerializeModel, and be sure to specify realtimeScoringOnly = TRUE, as in this example:

model <- rxSerializeModel(model.name, realtimeScoringOnly = TRUE)

步骤 2.Step 2. 调用 sp_rxPredictCall sp_rxPredict

像调用任何其他存储过程一样调用 sp_rxPredictYou call sp_rxPredict as you would any other stored procedure. 在当前版本中,此存储过程仅采用两个参数: @model,用于二进制格式的模型;以及 @inputData,用于评分中要使用的数据,定义为有效的 SQL 查询。In the current release, the stored procedure takes only two parameters: @model for the model in binary format, and @inputData for the data to use in scoring, defined as a valid SQL query.

由于二进制格式与 PREDICT 函数使用的格式相同,因此可以使用先前示例中的模型和数据表。Because the binary format is the same as that used by the PREDICT function, you can use the models and data table from the preceding example.

DECLARE @irismodel varbinary(max)
SELECT @irismodel = [native_model_object] from [ml_models]
WHERE model_name = 'iris.dtree' 
AND model_version = 'v1''

EXEC sp_rxPredict
@model = @irismodel,
@inputData = N'SELECT * FROM iris_rx_data'

备注

如果用于评分的输入数据不包含符合模型要求的列,对 sp_rxPredict 的调用则会失败。The call to sp_rxPredict fails if the input data for scoring does not include columns that match the requirements of the model. 目前仅支持以下 .NET 数据类型:double、float、short、ushort、long、ulong 和 string。Currently, only the following .NET data types are supported: double, float, short, ushort, long, ulong and string.

因此,你可能需要在输入数据中筛选掉不受支持的类型,然后再将它用于实时评分。Therefore, you might need to filter out unsupported types in your input data before using it for real-time scoring.

有关相应的 SQL 类型的信息,请参阅 SQL-CLR 类型映射映射 CLR 参数数据For information about corresponding SQL types, see SQL-CLR Type Mapping or Mapping CLR Parameter Data.

后续步骤Next steps