Introducing revoscalepy in SQL Server Machine Learning
revoscalepy is a new Python library provided by Microsoft to support distributed computing, remote compute contexts, and high-performance algorithms for Python developers.
It is based on the RevoScaleR package for R, which was provided in Microsoft R Server and SQL Server R Services, and aims to provide the same functionality:
- Supports multiple compute contexts, both remote and local
- Provides functions equivalent to those in RevoScaleR for data transformation and visualization
- Provides Python versions of RevoScaleR machine learning algorithms for distributed or parallel processing
- Improved performance, including use of the Intel math libraries
MicrosoftML packages are also provided for both R and Python. For more information, see Using MicrosoftML in SQL Server
Versions and supported platforms
The revoscalepy module is available only when you install one of the following Microsoft products:
- Machine Learning Services, in SQL Server 2017
- Microsoft Machine Learning Server 9.2.0 or later
To get the latest version of revoscalepy, install Cumulative Update 1 for SQL Server 2017. It includes many improvements in Python, including:
- A new Python function,
rx_create_col_info, that gets schema information from a SQL Server data source, like rxCreateColInfo for
- Enhancements to rx_exec to support parallel scenarios using the
Supported functions and data types
This section provides an overview of the Python data types and new Python functions supported in the revoscalepy module, beginning with SQL Server 2017 CTP 2.0 release.
For the latest list of functions in the Python libraries released to date, see these links:
Data types, data sources, and compute contexts
SQL Server and Python use different data types in some cases. For a list of mappings between SQL and Python data types, see Python libraries and data Types.
Data sources supported for machine learning with Python in SQL Server includes ODBC data sources, SQL Server database, and local files, including XDF files.
You create the data source object by using functions listed in the following table. After defining the data source, you load or transform the data by using an appropriate ETL function.
Many function names have changed since the initial release of Python in CTP 2.0.
SQL Server data
- Use RxSqlServerData to define a data source from a query or table
- Use RxInSqlServer to create a SQL Server compute context
- Use RxOdbcData to create a data source from an ODBC connection
revoscalepy also supports the XDF data source, used for moving data between memory and other data sources.
If you are new to the idea of data sources or compute contexts, we recommend that you start by reading about distributed computing works for machine learning in RevoScaleR.
Machine learning and summary functions
The following machine learning algorithms and summary functions from RevoScaleR are included in SQL Server 2017, beginning with CTP 2.0.
||Fit stochastic gradient boosted decision trees||
||Fit classification and regression decision forests||
||Fit classification and regression trees||
||Create a linear model||
||Create a logistic regression model||
||Generate predictions from a trained model||
||Generate a summary of the model|
New machine learning algorithms are also provided by the Python version of MicrosoftML:
||Create a decision forest model|
||Linear regression with stochastic dual coordinate ascent|
||Create a boosted tree model|
||Create a logistic regression model|
||Create a customizable neural network model|
||Creates a SVM model on an imbalanced dataset, for use in anomaly detection|
Many of these algorithms are already provided as modules in Azure Machine Learning.
MicrosoftML for Python also includes a variety of transformations and helper functions, such as:
rx_predictgenerates predictions from a trained model and can be used for realtime scoring
- image featurization functions
- functions for text processing and sentiment extraction
For details, see Introduction to MicrosoftML
The Python community uses coding conventions that might be different than what you're used to, including all lowercase letters and underscores rather than camel casing for parameter names. Also, maybe you've noticed that the revoscalepy library is always lowercase. That's right! Another Python convention.
Check out the tips on the Python reference documentation for Microsoft R: [Python function help]Python function help
You can run code that includes revoscalepy functions either locally or in a remote compute context. You can also run Python inside SQL Server by embedding Python code in a stored procedure.
When running locally, you typically run a Python script from the command line, or from a Python development environment, and specify a SQL Server compute context using one of the revoscalepy functions. You can use the remote compute context for the entire code, or for individual functions. For example, you might want to offload model training to the server to use the latest data and avoid data movement.
If you want to put a complete Python script inside the stored procedure, sp_execute_external_script, we recommend that you rewrite the code as a single function that has clearly defined inputs and outputs. Inputs and outputs must be pandas data frames. When this is done, you can call the stored procedure from any client that supports T-SQL, easily pass SQL queries as inputs, and save the results to SQL tables. For an example, see In-Database Python Analytics for SQL Developers.
Using remote compute contexts
This example demonstrates how to run Python using an instance of SQL Server as the compute context.
Using a stored procedure
This example demonstrates the basic mechanism of calling Python script that is embedded in a stored procedure.
Using revoscalepy with MicrosoftML
The Python functions for MicrosoftML are integrated with the compute contexts and data sources that are provided in revoscalepy. Therefore, you could use an MicrosoftML algorithm to define and train a model in Python, and use the revoscalepy functions to execute the Python code either locally or in a SQl Server compute context.
Just import the modules in your Python code, and then reference the individual functions you need.
from microsoftml.modules.logistic_regression.rx_logistic_regression import rx_logistic_regression from revoscalepy.functions.RxSummary import rx_summary from revoscalepy.etl.RxImport import rx_import_datasource
To run Python code in SQL Server, you must have installed SQL Server 2017 together with the feature Machine Learning Services, and enabled the Python language. Earlier versions of SQL Server do not support Python integration.
Open source distributions of Python do not support SQL Server compute contexts. However, if you need to publish and consume Python applications from Windows, you can install Microsoft Machine Learning Server without installing SQL Server. For more information, see Install SQL Server 2017 Machine Learning Server (Standalone).
Get more help
Complete documentation for these APIs will be available when the product is released. In the meantime, we recommend that you reference the corresponding function in the RevoScaleR or MicrosoftML libraries.
You can get help on any Python function by importing the module, and then calling
help(). For example, running
help(revoscalepy) from your Python IDE returns a list of all functions in the revoscalepy module, with their signatures.
If you use Python Tools for Visual Studio, you can use IntelliSense to get syntax and argument help. For more information, see Python support in Visual Studio, and download the extension that matches your version of Visual Studio. You can use Python with Visual Studio 2015 and 2017, or earlier versions.