rxPredict.mlModel：使用 Microsoft R 機器學習模型計分

發行項
05/04/2023

使用以 RevoScaleR 資料來源訓練的 Microsoft R 機器學習模型，報告資料框架或 RevoScaleR 資料來源中每個執行個體的計分結果。

使用方式

 ## S3 method for class `mlModel':
rxPredict  (modelObject, data, outData = NULL,
    writeModelVars = FALSE, extraVarsToWrite = NULL, suffix = NULL,
    overwrite = FALSE, dataThreads = NULL,
    blocksPerRead = rxGetOption("blocksPerRead"),
    reportProgress = rxGetOption("reportProgress"), verbose = 1,
    computeContext = rxGetOption("computeContext"), ...)

引數

`modelObject`

從 MicrosoftML 模型傳回的模型資訊物件。例如，從 rxFastTrees 或 rxLogisticRegression 傳回的物件。

`data`

RevoScaleR 資料來源物件、資料框架或 .xdf 檔案的路徑。

`outData`

輸出文字或 xdf 檔案名稱，或者具有寫入功能的 RxDataSource，可用來儲存預測。若為 NULL，則會傳回資料框架。預設值是 NULL。

`writeModelVars`

若為 TRUE，除了計分變數之外，也會將模型中的變數寫入至輸出資料集。如果來自輸入資料集的變數已在模型中轉換，也會包含轉換後的變數。預設值是 FALSE。

`extraVarsToWrite`

要包含在 outData 中的 NULL 或輸入資料中其他變數名稱的字元向量。若 writeModelVars 為 TRUE，也會包含模型變數。預設值是 NULL。

`suffix`

字元字串，指定要附加至所建立計分變數的尾碼，如果沒有尾碼，則指定 NULL。預設值是 NULL。

`overwrite`

若為 TRUE，則會覆寫現有的 outData；若為 FALSE，則不會覆寫現有的 outData。預設值是 FALSE。

`dataThreads`

整數，指定資料管線中所需的平行處理原則程度。若為 NULL，則會在內部決定使用的執行緒數目。預設值是 NULL。

`blocksPerRead`

指定要針對從資料來源讀取之每個資料區塊讀取的區塊數目。

`reportProgress`

指定資料列處理進度報告層級的整數值：

0：未報告進度。
1：已列印和更新處理的資料列數目。
2：報告已處理的資料列數目與時間。
3：報告已處理的資料列數目與所有時間。
預設值是 1。

`verbose`

指定所需輸出數量的整數值。若為 0，則計算期間不會列印任何詳細資訊輸出。整數值 1 到 4 提供越來越多的資訊量。預設值是 1。

`computeContext`

設定執行計算的內容，以有效的 RxComputeContext 指定。目前支援本機和 RxInSqlServer 計算內容。

`...`

直接傳遞至 Microsoft Compute Engine 的額外引數。

詳細資料

預設會在輸出中報告下列項目：針對二元分類器，會對三個變數計分：PredictedLabel、Score 和 Probability；針對 oneClassSvm 和迴歸分類器，則為 Score；針對多元分類器，則為 PredictedLabel，以及前面加上 Score 之每個類別的變數。

值

資料框架或 RxDataSource 物件，代表建立的輸出資料。根據預設，計分二元分類器的輸出會包含三個變數：PredictedLabel、Score 和 Probability；rxOneClassSvm 和迴歸會包含一個變數：Score；而多元分類器則會包含 PredictedLabel，以及前面加上 Score 之每個類別的變數。若提供 suffix，則會將其新增至這些輸出變數名稱的結尾。

作者

Microsoft Corporation Microsoft Technical Support

另請參閱

rxFastTrees、rxFastForest、rxLogisticRegression、rxNeuralNet、rxOneClassSvm。

範例



 # Estimate a logistic regression model
 infert1 <- infert
 infert1$isCase <- (infert1$case == 1)
 myModelInfo <- rxLogisticRegression(formula = isCase ~ age + parity + education + spontaneous + induced,
                        data = infert1)

 # Create an xdf file with per-instance results using rxPredict
 xdfOut <- tempfile(pattern = "scoreOut", fileext = ".xdf")
 scoreDS <- rxPredict(myModelInfo, data = infert1,
     outData = xdfOut, overwrite = TRUE,
     extraVarsToWrite = c("isCase", "Probability"))

 # Summarize results with an ROC curve
 rxRocCurve(actualVarName = "isCase", predVarNames = "Probability", data = scoreDS)

 # Use the built-in data set 'airquality' to create test and train data
 DF <- airquality[!is.na(airquality$Ozone), ]  
 DF$Ozone <- as.numeric(DF$Ozone)
 set.seed(12)
 randomSplit <- rnorm(nrow(DF))
 trainAir <- DF[randomSplit >= 0,]
 testAir <- DF[randomSplit < 0,]
 airFormula <- Ozone ~ Solar.R + Wind + Temp

 # Regression Fast Tree for train data
 fastTreeReg <- rxFastTrees(airFormula, type = "regression", 
     data = trainAir)  

 # Put score and model variables in data frame, including the model variables
 # Add the suffix "Pred" to the new variable
 fastTreeScoreDF <- rxPredict(fastTreeReg, data = testAir, 
     writeModelVars = TRUE, suffix = "Pred")

 rxGetVarInfo(fastTreeScoreDF)

 # Clean-up
 file.remove(xdfOut)