RevoScaleR (SQL Server 機器學習服務中的 R 套件)RevoScaleR (R package in SQL Server Machine Learning Services)

適用範圍:Applies to: 是SQL Server 2016 (13.x)SQL Server 2016 (13.x)yesSQL Server 2016 (13.x)SQL Server 2016 (13.x) 及更新版本適用範圍:Applies to: 是SQL Server 2016 (13.x)SQL Server 2016 (13.x)yesSQL Server 2016 (13.x)SQL Server 2016 (13.x) and later

RevoScaleR 是來自 Microsoft 的 R 套件,其支援分散式計算、遠端計算內容,以及高效能資料科學演算法。RevoScaleR is an R package from Microsoft that supports distributed computing, remote compute contexts, and high-performance data science algorithms. 其也支援資料匯入、資料轉換、摘要、視覺效果和分析。It also supports data import, data transformation, summarization, visualization, and analysis. 該套件包含在 SQL Server 機器學習服務SQL Server 2016 R Services 中。The package is included in SQL Server Machine Learning Services and SQL Server 2016 R Services.

相較於基底 R 函式,RevoScaleR 作業可針對大型的資料集、平行處理,以及在分散式檔案系統上執行。In contrast with base R functions, RevoScaleR operations can be performed against large datasets, in parallel, and on distributed file systems. 函式可以使用區塊化,以及在作業完成時重組結果,來處理無法放入記憶體中的資料集。Functions can operate over datasets that do not fit in memory by using chunking and by reassembling results when operations are complete.

RevoScaleR 函式會以 rx** 或 Rx 前置詞表示,使其易於識別。RevoScaleR functions are denoted with a rx** or Rx prefix to make them easy to identify.

RevoScaleR 可作為分散式資料科學的平台。RevoScaleR serves as a platform for distributed data science. 例如,您可以在 MicrosoftML 中,使用 RevoScaleR 計算內容和轉換搭配最先進的演算法。For example, you can use the RevoScaleR compute contexts and transformations with the state-of-the-art algorithms in MicrosoftML. 您也可以使用 rxExec 平行執行基底 R 函式。You can also use rxExec to run base R functions in parallel.

完整參考文件Full reference documentation

RevoScaleR 套件分散在多個 Microsoft 產品中,但不論是在 SQL Server 或其他產品中取得該套件,其使用方式都相同。The RevoScaleR package is distributed in multiple Microsoft products, but usage is the same whether you get the package in SQL Server or another product. 由於函式相同,因此個別 RevoScaleR 函式的文件只發佈至 Microsoft Machine Learning Server 之 R 參考底下的一個位置。Because the functions are the same, documentation for individual RevoScaleR functions is published to just one location under the R reference for Microsoft Machine Learning Server. 若有任何產品特定行為存在,函式說明頁面中將會註明不一致之處。Should any product-specific behaviors exist, discrepancies will be noted in the function help page.

版本與平台Versions and platforms

RevoScaleR 套件以 R 3.4.3 為基礎,且只有當安裝下列其中一個 Microsoft 產品或下載項目時才會提供:The RevoScaleR package is based on R 3.4.3 and available only when you install one of the following Microsoft products or downloads:

注意

在 SQL Server 2017 中,完整產品發行版本僅適用於 Windows。Full product release versions are Windows-only in SQL Server 2017. SQL Server 2019 中,RevoScaleR 則同時支援 Windows 和 Linux。Both Windows and Linux are supported for RevoScaleR in SQL Server 2019.

依類別區分的函式Functions by category

本節依類別列出函式,讓您了解每個函式的使用方式。This section lists the functions by category to give you an idea of how each one is used. 您也可以使用目錄來依字母順序尋找函式。You can also use the table of contents to find functions in alphabetical order.

1-資料來源與計算1-Data source and compute

RevoScaleR 包含用於建立資料來源及設定計算執行位置 (或 計算內容) 的函式。RevoScaleR includes functions for creating data sources and setting the location, or compute context, of where computations are performed. 資料來源物件是可一起指定連接字串和您想要之資料集 (可定義為資料表、檢視或查詢) 的容器。A data source object is a container that specifies a connection string together with the set of data that you want, defined either as a table, view, or query. 不支援預存程序呼叫。Stored procedure calls are not supported. 下表列出與 SQL Server 案例相關的函式。Functions relevant to SQL Server scenarios are listed in the table below.

在某些情況下,SQL Server 和 R 會使用不同的資料類型。SQL Server and R use different data types in some cases. 如需 SQL 與 R 資料類型間的對應清單,請參閱 R 與 SQL 的對應資料類型For a list of mappings between SQL and R data types, see R-to-SQL data types.

函式Function 描述Description
RxInSqlServerRxInSqlServer 建立 SQL Server 計算內容物件以將計算推送至遠端執行個體。Create a SQL Server compute context object to push computations to a remote instance. 數個 RevoScaleR 函式會以計算內容作為引數。Several RevoScaleR functions take compute context as an argument.
rxGetComputeContext / rxSetComputeContextrxGetComputeContext / rxSetComputeContext 取得或設定使用中的計算內容。Get or set the active compute context.
RxSqlServerDataRxSqlServerData 根據 SQL Server 查詢或資料表來建立資料物件。Create a data object based on a SQL Server query or table.
RxOdbcDataRxOdbcData 根據 ODBC 連線來建立資料來源。Create a data source based on an ODBC connection.
RxXdfDataRxXdfData 根據本機 XDF 檔案來建立資料來源。Create a data source based on a local XDF file. XDF 檔案通常用來將記憶體內的資料卸載至磁碟。XDF files are often used to offload in-memory data to disk. 當使用的資料超過可從資料庫以單一批次傳輸的資料,或是超過記憶體可容納的資料時,XDF 檔案非常實用。An XDF file can be useful when working with more data than can be transferred from the database in one batch, or more data than can fit in memory. 例如,如果您會定期將大量資料從資料庫移到本機工作站,而不是針對每個 R 作業重複地查詢資料庫,則您可以使用 XDF 檔案作為一種快取以將資料儲存在本機,然後在您的 R 工作區中使用它。For example, if you regularly move large amounts of data from a database to a local workstation, rather than query the database repeatedly for each R operation, you can use the XDF file as a kind of cache to save the data locally and then work with it in your R workspace.

提示

如果您不熟悉資料來源或計算內容,建議您從 Microsoft Machine Learning Server 文件中的分散式計算 (英文) 開始著手。If you are new to the idea of data sources or compute contexts, we recommend that you start with distributed computing in the Microsoft Machine Learning Server documentation.

執行 DDL 陳述式Perform DDL statements

您可以從 R 執行 DDL 陳述式,前提是您具有執行個體及資料庫的必要權限。You can execute DDL statements from R, if you have the necessary permissions on the instance and database. 下列函式會使用 ODBC 呼叫來執行 DDL 陳述式或擷取資料庫結構描述。The following functions use ODBC calls to execute DDL statements or retrieve the database schema.

函式Function 說明Description
rxSqlServerTableExists 和 rxSqlServerDropTablerxSqlServerTableExists and rxSqlServerDropTable 置放 SQL ServerSQL Server 資料表,或檢查資料庫資料表或物件是否存在。Drop a SQL ServerSQL Server table, or check for the existence of a database table or object.
rxExecuteSQLDDLrxExecuteSQLDDL 執行定義或操作資料庫物件的資料定義語言 (DDL) 命令。Execute a Data Definition Language (DDL) command that defines or manipulates database objects. 此函式無法傳回資料,而且只會用來擷取或修改物件結構描述或中繼資料。This function cannot return data, and is used only to retrieve or modify the object schema or metadata.

2-資料操作 (ETL)2-Data manipulation (ETL)

建立資料來源物件之後,您可以使用物件將資料載入其中、轉換資料,或將新的資料寫入至指定的目的地。After you have created a data source object, you can use the object to load data into it, transform data, or write new data to the specified destination. 根據來源中的資料大小,您也可以定義資料來源中的批次大小,以及以區塊移動資料。Depending on the size of the data in the source, you can also define the batch size as part of the data source and move data in chunks.

函式Function 說明Description
rxOpen-methodsrxOpen-methods 檢查資料來源是否可用、開啟或關閉資料來源、從來源讀取資料、將資料寫入至目標,以及關閉資料來源。Check whether a data source is available, open or close a data source, read data from a source, write data to the target, and close a data source.
rxImportrxImport 將資料從資料來源移到檔案儲存體或資料框架中。Move data from a data source into file storage or into a data frame.
rxDataSteprxDataStep 在資料來源之間移動資料時進行轉換。Transform data while moving it between data sources.

3-圖形函式3-Graphing functions

函式名稱Function name 描述Description
rxHistogramrxHistogram 從資料建立長條圖。Creates a histogram from data.
rxLinePlotrxLinePlot 從資料建立線繪圖。Creates a line plot from data.
rxLorenzrxLorenz 計算可繪製的 Lorenz 曲線。Computes a Lorenz curve, which can be plotted.
rxRocCurverxRocCurve 從實際和預測資料計算並繪製 ROC 曲線。Computes and plots ROC curves from actual and predicted data.

4-描述性統計資料4-Descriptive statistics

函式名稱Function name 描述Description
rxQuantile (英文) *rxQuantile * 計算 .xdf 檔案和資料框架的近似分位數,而不需要排序。Computes approximate quantiles for .xdf files and data frames without sorting.
rxSummary (英文) *rxSummary * 資料的基本摘要統計資料,包括依群組計算。Basic summary statistics of data, including computations by group. 不支援依群組計算寫入至 .xdf 檔案。Writing by group computations to .xdf file not supported.
rxCrossTabs (英文) *rxCrossTabs * 資料以公式為基礎的交叉資料表。Formula-based cross-tabulation of data.
rxCube (英文) *rxCube * 以公式為基礎的替代交叉資料表,針對有效表示傳回的 Cube 結果所設計。Alternative formula-based cross-tabulation designed for efficient representation returning cube results. 不支援將輸出寫入至 .xdf 檔案。Writing output to .xdf file not supported.
rxMarginalsrxMarginals 交叉資料表的臨界摘要。Marginal summaries of cross-tabulations.
as.xtabsas.xtabs 將交叉資料表結果轉換為 xtabs 物件。Converts cross tabulation results to an xtabs object.
rxChiSquaredTestrxChiSquaredTest 針對 xtabs 物件執行卡方檢定。Performs Chi-squared Test on xtabs object. 搭配小型資料集使用,而且不會將資料區塊化。Used with small data sets and does not chunk data.
rxFisherTestrxFisherTest 針對 xtabs 物件執行費雪精確檢定。Performs Fisher's Exact Test on xtabs object. 搭配小型資料集使用,而且不會將資料區塊化。Used with small data sets and does not chunk data.
rxKendallCorrxKendallCor 使用 xtabs 物件計算肯德爾等級相關係數。Computes Kendall's Tau Rank Correlation Coefficient using xtabs object.
rxPairwiseCrossTabrxPairwiseCrossTab 將函式套用到成對組合的 xtabs 物件資料列與資料行。Apply a function to pairwise combinations of rows and columns of an xtabs object.
rxRiskRatiorxRiskRatio 計算兩兩 xtabs 物件的相對風險。Calculate the relative risk on a two-by-two xtabs object.
rxOddsRatiorxOddsRatio 計算兩兩 xtabs 物件的勝算比。Calculate the odds ratio on a two-by-two xtabs object.

* 表示此類別中最常用的函式。* Signifies the most popular functions in this category.

5-預測函式5-Prediction functions

函式名稱Function name 描述Description
rxLinMod (英文) *rxLinMod * 將線性模型套入資料。Fits a linear model to data.
rxLogit (英文) *rxLogit * 將羅吉斯迴歸模型套入資料。Fits a logistic regression model to data.
rxGlm (英文) *rxGlm * 將廣義線性模型套入資料。Fits a generalized linear model to data.
rxCovCor (英文) *rxCovCor * 計算一組變數的共變數、相關性或正方形 / 交叉乘積矩陣的總和。Calculate the covariance, correlation, or sum of squares / cross-product matrix for a set of variables.
rxDTree (英文) *rxDTree * 將分類或迴歸樹狀結構套入資料。Fits a classification or regression tree to data.
rxBTrees (英文) *rxBTrees * 使用隨機梯度提升演算法,將分類或迴歸決策樹系套入資料。Fits a classification or regression decision forest to data using a stochastic gradient boosting algorithm.
rxDForest (英文) *rxDForest * 將分類或迴歸決策樹系套入資料。Fits a classification or regression decision forest to data.
rxPredict (英文) *rxPredict * 計算適合模型的預測。Calculates predictions for fitted models. 輸出必須是 XDF 資料來源。Output must be an XDF data source.
rxKmeans (英文) *rxKmeans * 執行 K 平均數叢集。Performs k-means clustering.
rxNaiveBayes (英文) *rxNaiveBayes * 執行貝氏機率分類。Performs Naive Bayes classification.
rxCovrxCov 計算一組變數的共變數矩陣。Calculate the covariance matrix for a set of variables.
rxCorrxCor 計算一組變數的相關矩陣。Calculate the correlation matrix for a set of variables.
rxSSCPrxSSCP 計算一組變數的正方形/交叉乘積矩陣總和。Calculate the sum of squares / cross-product matrix for a set of variables.
rxRocrxRoc 使用二元分類器系統的實際與預測值的接收器操作特徵曲線 (ROC) 計算。Receiver Operating Characteristic (ROC) computations using actual and predicted values from binary classifier system.

* 表示此類別中最常用的函式。* Signifies the most popular functions in this category.

如何使用 RevoScaleRHow to work with RevoScaleR

封裝在預存程序中的 R 程式碼可呼叫 RevoScaleR 中的函式。Functions in RevoScaleR are callable in R code encapsulated in stored procedures. 大多數開發人員會在本機建置 RevoScaleR 解決方案,然後將完成的 R 程式碼移轉至預存程序作為部署練習。Most developers build RevoScaleR solutions locally, and then migrate finished R code to stored procedures as a deployment exercise.

在本機執行時,您通常會從命令列或從 R 開發環境執行 R 指令碼,然後使用其中一個 RevoScaleR 函式來指定 SQL Server 計算內容。When running locally, you typically run an R script from the command line, or from an R development environment, and specify a SQL Server compute context using one of the RevoScaleR functions. 您可以將遠端計算內容用於整個程式碼,也可以用於個別函式。You can use the remote compute context for the entire code, or for individual functions. 例如,您可以將模型定型卸載至伺服器,以使用最新資料並避免資料移動。For example, you might want to offload model training to the server to use the latest data and avoid data movement.

當您準備好將 R 指令碼封裝在預存程序 sp_execute_external_script 內時,建議您將程式碼重寫成已清楚定義輸入和輸出的單一函式。When you are ready to encapsulate R script inside a stored procedure, sp_execute_external_script, we recommend rewriting the code as a single function that has clearly defined inputs and outputs.

另請參閱See also