revoscalepy (SQL Server 機器學習服務中的 Python 套件)revoscalepy (Python package in SQL Server Machine Learning Services)

適用範圍:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) 及更新版本適用範圍:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) and later

revoscalepy 是來自 Microsoft 的 Python 套件,其支援分散式計算、遠端計算內容,以及高效能資料科學演算法。revoscalepy is a Python package from Microsoft that supports distributed computing, remote compute contexts, and high-performance data science algorithms. 該套件包含在 SQL Server 機器學習服務中。The package is included in SQL Server Machine Learning Services.

套件提供下列功能:The package offers the following functionality:

  • 系統上的本機和遠端計算內容具有相同版本的 revoscalepyLocal and remote compute contexts on systems having the same version of revoscalepy
  • 資料轉換與視覺化函式Data transformation and visualization functions
  • 資料科學函式 (可透過分散式或平行處理進行調整)Data science functions, scalable through distributed or parallel processing
  • 提升效能 (包括使用 Intel 數學程式庫)Improved performance, including use of the Intel math libraries

您在 revoscalepy 中建立的資料來源和計算內容也可以在機器學習演算法中使用。Data sources and compute contexts that you create in revoscalepy can also be used in machine learning algorithms. 如需這些演算法的簡介,請參閱 SQL Server 中的 microsoftml Python 模組For an introduction to these algorithms, see microsoftml Python module in SQL Server.

完整參考文件Full reference documentation

revoscalepy 套件分散在多個 Microsoft 產品中,但不論是在 SQL Server 還是其他產品中取得該套件,其使用方式都相同。The revoscalepy package is distributed in multiple Microsoft products, but usage is the same whether you get the package in SQL Server or another product. 由於函式相同,因此個別 revoscalepy 函式的文件只發佈至 Microsoft Machine Learning Server 之 Python 參考底下的一個位置。Because the functions are the same, documentation for individual revoscalepy functions is published to just one location under the Python reference for Microsoft Machine Learning Server. 若有任何產品特定行為存在,函式說明頁面中將會註明不一致之處。Should any product-specific behaviors exist, discrepancies will be noted in the function help page.

版本與平台Versions and platforms

revoscalepy 模組以 Python 3.5 為基礎,且只有當您安裝下列其中一個Microsoft 產品或下載項目時才會提供:The revoscalepy module is based on Python 3.5 and available only when you install one of the following Microsoft products or downloads:

注意

在 SQL Server 2017 中,完整產品發行版本僅適用於 Windows。Full product release versions are Windows-only in SQL Server 2017. SQL Server 2019 及更新版本中,revoscalepy 同時支援 Windows 和 Linux。Both Windows and Linux are supported for revoscalepy in SQL Server 2019 and later.

依類別區分的函式Functions by category

本節依類別列出函式,讓您了解每個函式的使用方式。This section lists the functions by category to give you an idea of how each one is used. 您也可以使用目錄來依字母順序尋找函式。You can also use the table of contents to find functions in alphabetical order.

1-資料來源與計算1-Data source and compute

revoscalepy 包含用於建立資料來源及設定計算執行位置 (或「計算內容」 ) 的函式。revoscalepy includes functions for creating data sources and setting the location, or compute context, of where computations are performed. 下表列出與 SQL Server 案例相關的函式。Functions relevant to SQL Server scenarios are listed in the table below.

在某些情況下,SQL Server 和 Python 會使用不同的資料類型。SQL Server and Python use different data types in some cases. 如需 SQL 與 Python 資料類型間的對應清單,請參閱 Python 與 SQL 的對應資料類型For a list of mappings between SQL and Python data types, see Python-to-SQL data types.

函式Function 描述Description
RxInSqlServerRxInSqlServer 建立 SQL Server 計算內容物件以將計算推送至遠端執行個體。Create a SQL Server compute context object to push computations to a remote instance. 數個 revoscalepy 函式會以計算內容作為引數。Several revoscalepy functions take compute context as an argument. 如需內容切換範例,請參閱使用 revoscalepy 來建立模型For a context-switch example, see Create a model using revoscalepy.
RxSqlServerDataRxSqlServerData 根據 SQL Server 查詢或資料表來建立資料物件。Create a data object based on a SQL Server query or table.
RxOdbcDataRxOdbcData 根據 ODBC 連線來建立資料來源。Create a data source based on an ODBC connection.
RxXdfDataRxXdfData 根據本機 XDF 檔案來建立資料來源。Create a data source based on a local XDF file. XDF 檔案通常用來將記憶體內的資料卸載至磁碟。XDF files are often used to offload in-memory data to disk. 當使用的資料超過可從資料庫以單一批次傳輸的資料,或是超過記憶體可容納的資料時,XDF 檔案非常實用。An XDF file can be useful when working with more data than can be transferred from the database in one batch, or more data than can fit in memory. 例如,如果您會定期將大量資料從資料庫移到本機工作站,而不是針對每個 R 作業重複地查詢資料庫,則您可以使用 XDF 檔案作為一種快取以將資料儲存在本機,然後在您的 R 工作區中使用它。For example, if you regularly move large amounts of data from a database to a local workstation, rather than query the database repeatedly for each R operation, you can use the XDF file as a kind of cache to save the data locally and then work with it in your R workspace.

提示

如果您不熟悉資料來源或計算內容,建議您從 Microsoft Machine Learning Server 文件中的分散式計算 (英文) 開始著手。If you are new to the idea of data sources or compute contexts, we recommend that you start with distributed computing in the Microsoft Machine Learning Server documentation.

2-資料操作 (ETL)2-Data manipulation (ETL)

函式Function 描述Description
rx_importrx_import 將資料匯入至 .xdf 檔案或資料框架。Import data into a .xdf file or data frame.
rx_data_steprx_data_step 將資料從輸入資料集轉換至輸出資料集。Transform data from an input data set to an output data set.

3-定型與摘要3-Training and summarization

函式Function 描述Description
rx_btreesrx_btrees 符合隨機梯度提升決策樹Fit stochastic gradient boosted decision trees
rx_dforestrx_dforest 符合分類與迴歸決策樹系Fit classification and regression decision forests
rx_dtreerx_dtree 符合分類與迴歸樹Fit classification and regression trees
rx_lin_modrx_lin_mod 建立線性迴歸模型Create a linear regression model
rx_logitrx_logit 建立羅吉斯迴歸模型Create a logistic regression model
rx_summaryrx_summary 在 revoscalepy 中產生單變量物件摘要。Produce univariate summaries of objects in revoscalepy.

您也應該檢閱 microsoftml 中的函式來了解額外的方法。You should also review the functions in microsoftml for additional approaches.

4-評分函式4-Scoring functions

函式Function 描述Description
rx_predictrx_predict 從已定型的模型產生預測Generate predictions from a trained model )) 從已定型的模型產生預測,並可用於即時評分。Generates predictions from a trained model and can be used for real-time scoring.
rx_predict_defaultrx_predict_default 使用 rx_lin_mod 和 rx_logit 物件來計算預測值和殘差。Compute predicted values and residuals using rx_lin_mod and rx_logit objects.
rx_predict_rx_dforestrx_predict_rx_dforest 從 rx_dforest 或 rx_btrees 物件計算資料集的預測值或擬合值。Calculate predicted or fitted values for a data set from an rx_dforest or rx_btrees object.
rx_predict_rx_dtreerx_predict_rx_dtree 從 rx_dtree 物件計算資料集的預測值或擬合值。Calculate predicted or fitted values for a data set from an rx_dtree object.

如何使用 revoscalepyHow to work with revoscalepy

封裝在預存程序中的 Python 程式碼可呼叫 revoscalepy 中的函式。Functions in revoscalepy are callable in Python code encapsulated in stored procedures. 大多數開發人員會在本機建置 revoscalepy 解決方案,然後將完成的 Python 程式碼移轉至預存程序作為部署練習。Most developers build revoscalepy solutions locally, and then migrate finished Python code to stored procedures as a deployment exercise.

在本機執行時,您通常會從命令列或從 Python 開發環境執行 Python 指令碼,然後使用其中一個 revoscalepy 函式來指定 SQL Server 計算內容。When running locally, you typically run a Python script from the command line, or from a Python development environment, and specify a SQL Server compute context using one of the revoscalepy functions. 您可以將遠端計算內容用於整個程式碼,也可以用於個別函式。You can use the remote compute context for the entire code, or for individual functions. 例如,您可以將模型定型卸載至伺服器,以使用最新資料並避免資料移動。For example, you might want to offload model training to the server to use the latest data and avoid data movement.

當您準備好將 Python 指令碼封裝在預存程序 sp_execute_external_script 內時,建議您將程式碼重寫成已清楚定義輸入和輸出的單一函式。When you are ready to encapsulate Python script inside a stored procedure, sp_execute_external_script, we recommend rewriting the code as a single function that has clearly defined inputs and outputs.

輸入和輸出必須是 pandas 資料框架。Inputs and outputs must be pandas data frames. 完成此作業後,您就可以從任何支援 T-SQL 的用戶端呼叫該預存程序、輕鬆傳遞 SQL 查詢作為輸入,然後將結果儲存至 SQL 資料表。When this is done, you can call the stored procedure from any client that supports T-SQL, easily pass SQL queries as inputs, and save the results to SQL tables. 如需範例,請參閱了解適用於 SQL 開發人員的資料庫內 Python 分析For an example, see Learn in-database Python analytics for SQL developers.

搭配 microsoftml 使用 revoscalepyUsing revoscalepy with microsoftml

microsoftml 的 Python 函式已與 revoscalepy 中提供的計算內容和資料來源整合。The Python functions for microsoftml are integrated with the compute contexts and data sources that are provided in revoscalepy. 從 microsoftml 呼叫函式時 (例如定義模型並將其定型時),請使用 revoscalepy 函式在本機或 SQL Server 遠端計算內容中執行 Python 程式碼。When calling functions from microsoftml, for example when defining and training a model, use the revoscalepy functions to execute the Python code either locally or in a SQL Server remote compute context.

下列範例說明匯入您 Python 程式碼中模組的語法。The following example shows the syntax for importing modules in your Python code. 您可以接著參考所需的個別函式。You can then reference the individual functions you need.

from microsoftml.modules.logistic_regression.rx_logistic_regression import rx_logistic_regression
from revoscalepy.functions.RxSummary import rx_summary
from revoscalepy.etl.RxImport import rx_import_datasource

另請參閱See also