Introducing revoscalepy

revoscalepy is a new library provided by Microsoft to support distributed computing, remote compute contexts, and high-performance algorithms for Python.

It is based on the RevoScaleR package for R, which was provided in Microsoft R Server and SQL Server R Services, and aims to provide the same functionality:

  • Supports multiple compute contexts, both remote and local
  • Provides functions equivalent to those in RevoScaleR for data transformation and visualization
  • Provides Python versions of RevoScaleR machine learning algorithms for distributed or parallel processing
  • Improved performance, including use of the Intel math libraries

MicrosoftML packages are also provided for both R and Python. For more information, see Using MicrosoftML in SQL Server

Versions and supported platforms

The revoscalepy module is available only when you install one of the following Microsoft products:

  • Machine Learning Services, in SQL Server 2017
  • Microsoft Machine Learning Server 9.2.0 or later

To get the latest version of revoscalepy, install Cumulative Update 1 for SQL Server 2017. It includes many improvements in Python, including:

  • A new Python function, rx_create_col_info, that gets schema information from a SQL Server data source, like rxCreateColInfo for
  • Enhancements to rx_exec to support parallel scenarios using the RxLocalParallel compute context.

Supported functions and data types

This section provides an overview of the Python data types and new Python functions supported in the revoscalepy module, beginning with SQL Server 2017 CTP 2.0 release.

For the latest list of functions in the Python libraries released to date, see these links:

Data types, data sources, and compute contexts

SQL Server and Python use different data types in some cases. For a list of mappings between SQL and Python data types, see Python libraries and data Types.

Data sources supported for machine learning with Python in SQL Server includes ODBC data sources, SQL Server database, and local files, including XDF files.

You create the data source object by using functions listed in the following table. After defining the data source, you load or transform the data by using an appropriate ETL function.

Important

Many function names have changed since the initial release of Python in CTP 2.0.

SQL Server data

revoscalepy also supports the XDF data source, used for moving data between memory and other data sources.

Tip

If you are new to the idea of data sources or compute contexts, we recommend that you start by reading about distributed computing works for machine learning in RevoScaleR.

Machine learning and summary functions

The following machine learning algorithms and summary functions from RevoScaleR are included in SQL Server 2017, beginning with CTP 2.0.

Function Description Notes
rx_btrees Fit stochastic gradient boosted decision trees rx_btrees_ex in CTP 2.0
rx_dforest Fit classification and regression decision forests rx_dforest_ex in CTP 2.0
rx_dtree Fit classification and regression trees rx_dtree_ex in CTP 2.0
rx_lin_mod Create a linear model rx_lin_mod_ex in CTP 2.0
rx_logit Create a logistic regression model rx_logit_ex in CTP 2.0
rx_predict Generate predictions from a trained model rx_predict_ex in CTP 2.0
rx_summary Generate a summary of the model

New machine learning algorithms are also provided by the Python version of MicrosoftML:

Function Description
rx_fast_forest Create a decision forest model
rx_fast_linear Linear regression with stochastic dual coordinate ascent
rx_fast_trees Create a boosted tree model
rx_logistic_regression Create a logistic regression model
rx_neural_network Create a customizable neural network model
rx_oneclass_svm Creates a SVM model on an imbalanced dataset, for use in anomaly detection

Tip

Many of these algorithms are already provided as modules in Azure Machine Learning.

MicrosoftML for Python also includes a variety of transformations and helper functions, such as:

  • rx_predict generates predictions from a trained model and can be used for realtime scoring
  • image featurization functions
  • functions for text processing and sentiment extraction

For details, see Introduction to MicrosoftML

Note

The Python community uses coding conventions that might be different than what you're used to, including all lowercase letters and underscores rather than camel casing for parameter names. Also, maybe you've noticed that the revoscalepy library is always lowercase. That's right! Another Python convention.

Check out the tips on the Python reference documentation for Microsoft R: [Python function help]Python function help

Examples

You can run code that includes revoscalepy functions either locally or in a remote compute context. You can also run Python inside SQL Server by embedding Python code in a stored procedure.

When running locally, you typically run a Python script from the command line, or from a Python development environment, and specify a SQL Server compute context using one of the revoscalepy functions. You can use the remote compute context for the entire code, or for individual functions. For example, you might want to offload model training to the server to use the latest data and avoid data movement.

If you want to put a complete Python script inside the stored procedure, sp_execute_external_script, we recommend that you rewrite the code as a single function that has clearly defined inputs and outputs. Inputs and outputs must be pandas data frames. When this is done, you can call the stored procedure from any client that supports T-SQL, easily pass SQL queries as inputs, and save the results to SQL tables. For an example, see In-Database Python Analytics for SQL Developers.

Using remote compute contexts

Using a stored procedure

  • Run Python using T-SQL

    This example demonstrates the basic mechanism of calling Python script that is embedded in a stored procedure.

Using revoscalepy with MicrosoftML

The Python functions for MicrosoftML are integrated with the compute contexts and data sources that are provided in revoscalepy. Therefore, you could use an MicrosoftML algorithm to define and train a model in Python, and use the revoscalepy functions to execute the Python code either locally or in a SQl Server compute context.

Just import the modules in your Python code, and then reference the individual functions you need.

from microsoftml.modules.logistic_regression.rx_logistic_regression import rx_logistic_regression
from revoscalepy.functions.RxSummary import rx_summary
from revoscalepy.etl.RxImport import rx_import_datasource

Requirements

To run Python code in SQL Server, you must have installed SQL Server 2017 together with the feature Machine Learning Services, and enabled the Python language. Earlier versions of SQL Server do not support Python integration.

Note

Open source distributions of Python do not support SQL Server compute contexts. However, if you need to publish and consume Python applications from Windows, you can install Microsoft Machine Learning Server without installing SQL Server. For more information, see Create a Standalone R Server

Get more help

Complete documentation for these APIs will be available when the product is released. In the meantime, we recommend that you reference the corresponding function in the RevoScaleR or MicrosoftML libraries.

You can get help on any Python function by importing the module, and then calling help(). For example, running help(revoscalepy) from your Python IDE returns a list of all functions in the revoscalepy module, with their signatures.

If you use Python Tools for Visual Studio, you can use IntelliSense to get syntax and argument help. For more information, see Python support in Visual Studio, and download the extension that matches your version of Visual Studio. You can use Python with Visual Studio 2015 and 2017, or earlier versions.

See Also

Python tutorials