Native scoring using the PREDICT T-SQL function with SQL machine learning

Applies to: yesSQL Server 2017 (14.x) and later YesAzure SQL Database YesAzure SQL Managed Instance yesAzure Synapse Analytics

Learn how to use native scoring with the PREDICT T-SQL function to generate prediction values for new data inputs in near-real-time. Native scoring requires that you have an already-trained model.

The PREDICT function uses the native C++ extension capabilities in SQL machine learning. This methodology offers the fastest possible processing speed of forecasting and prediction workloads and support models in Open Neural Network Exchange (ONNX) format or models trained using the RevoScaleR and revoscalepy packages.

How native scoring works

Native scoring uses libraries that can read models in ONNX or a predefined binary format, and generate scores for new data inputs that you provide. Because the model is trained, deployed, and stored, it can be used for scoring without having to call the R or Python interpreter. This means that the overhead of multiple process interactions is reduced, resulting in faster prediction performance.

To use native scoring, call the PREDICT T-SQL function and pass the following required inputs:

  • A compatible model based on a supported model and algorithm.
  • Input data, typically defined as a T-SQL query.

The function returns predictions for the input data, together with any columns of source data that you want to pass through.

Prerequisites

PREDICT is available on:

  • All editions of SQL Server 2017 and later on Windows and Linux
  • Azure SQL Managed Instance
  • Azure SQL Database
  • Azure SQL Edge
  • Azure Synapse Analytics

The function is enabled by default. You do not need to install R or Python, or enable additional features.

Supported models

The model formats supported by the PREDICT function depends on the SQL platform on which you perform native scoring. See the table below to see which model formats are supported on which platform.

Platform ONNX model format RevoScale model format
SQL Server No Yes
Azure SQL Managed Instance Yes Yes
Azure SQL Database No Yes
Azure SQL Edge Yes No
Azure Synapse Analytics Yes No

ONNX models

The model must be in an Open Neural Network Exchange (ONNX) model format.

RevoScale models

The model must be trained in advance using one of the supported rx algorithms listed below using the RevoScaleR or revoscalepy package.

Serialize the model using rxSerialize for R, and rx_serialize_model for Python. These serialization functions have been optimized to support fast scoring.

Supported RevoScale algorithms

The following algorithms are supported in revoscalepy and RevoScaleR.

If you need to use an algorithms from MicrosoftML or microsoftml, use real-time scoring with sp_rxPredict.

Unsupported model types include the following types:

  • Models containing other transformations
  • Models using the rxGlm or rxNaiveBayes algorithms in RevoScaleR or revoscalepy equivalents
  • PMML models
  • Models created using other open-source or third-party libraries

Examples

PREDICT with an ONNX model

This example shows how to use an ONNX model stored in the dbo.models table for native scoring.

DECLARE @model VARBINARY(max) = (
        SELECT DATA
        FROM dbo.models
        WHERE id = 1
        );

WITH predict_input
AS (
    SELECT TOP (1000) [id]
        , CRIM
        , ZN
        , INDUS
        , CHAS
        , NOX
        , RM
        , AGE
        , DIS
        , RAD
        , TAX
        , PTRATIO
        , B
        , LSTAT
    FROM [dbo].[features]
    )
SELECT predict_input.id
    , p.variable1 AS MEDV
FROM PREDICT(MODEL = @model, DATA = predict_input, RUNTIME=ONNX) WITH (variable1 FLOAT) AS p;

Note

Because the columns and values returned by PREDICT can vary by model type, you must define the schema of the returned data by using a WITH clause.

PREDICT with RevoScale model

In this example, you create a model using RevoScaleR in R, and then call the real-time prediction function from T-SQL.

Step 1. Prepare and save the model

Run the following code to create the sample database and required tables.

CREATE DATABASE NativeScoringTest;
GO
USE NativeScoringTest;
GO
DROP TABLE IF EXISTS iris_rx_data;
GO
CREATE TABLE iris_rx_data (
    "Sepal.Length" float not null, "Sepal.Width" float not null
  , "Petal.Length" float not null, "Petal.Width" float not null
  , "Species" varchar(100) null
);
GO

Use the following statement to populate the data table with data from the iris dataset.

INSERT INTO iris_rx_data ("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width" , "Species")
EXECUTE sp_execute_external_script
  @language = N'R'
  , @script = N'iris_data <- iris;'
  , @input_data_1 = N''
  , @output_data_1_name = N'iris_data';
GO

Now, create a table for storing models.

DROP TABLE IF EXISTS ml_models;
GO
CREATE TABLE ml_models ( model_name nvarchar(100) not null primary key
  , model_version nvarchar(100) not null
  , native_model_object varbinary(max) not null);
GO

The following code creates a model based on the iris dataset and saves it to the table named models.

DECLARE @model varbinary(max);
EXECUTE sp_execute_external_script
  @language = N'R'
  , @script = N'
    iris.sub <- c(sample(1:50, 25), sample(51:100, 25), sample(101:150, 25))
    iris.dtree <- rxDTree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris[iris.sub, ])
    model <- rxSerializeModel(iris.dtree, realtimeScoringOnly = TRUE)
    '
  , @params = N'@model varbinary(max) OUTPUT'
  , @model = @model OUTPUT
  INSERT [dbo].[ml_models]([model_name], [model_version], [native_model_object])
  VALUES('iris.dtree','v1', @model) ;

Note

Be sure to use the rxSerializeModel function from RevoScaleR to save the model. The standard R serialize function cannot generate the required format.

You can run a statement such as the following to view the stored model in binary format:

SELECT *, datalength(native_model_object)/1024. as model_size_kb
FROM ml_models;

Step 2. Run PREDICT on the model

The following simple PREDICT statement gets a classification from the decision tree model using the native scoring function. It predicts the iris species based on attributes you provide, petal length and width.

DECLARE @model varbinary(max) = (
  SELECT native_model_object
  FROM ml_models
  WHERE model_name = 'iris.dtree'
  AND model_version = 'v1');
SELECT d.*, p.*
  FROM PREDICT(MODEL = @model, DATA = dbo.iris_rx_data as d)
  WITH(setosa_Pred float, versicolor_Pred float, virginica_Pred float) as p;
go

If you get the error, "Error occurred during execution of the function PREDICT. Model is corrupt or invalid", it usually means that your query didn't return a model. Check whether you typed the model name correctly, or if the models table is empty.

Note

Because the columns and values returned by PREDICT can vary by model type, you must define the schema of the returned data by using a WITH clause.

Next steps