What is SQL Server Machine Learning Services (Python and R)?

APPLIES TO: yesSQL Server noAzure SQL Database noAzure Synapse Analytics (SQL DW) noParallel Data Warehouse

Machine Learning Services is a feature in SQL Server that gives the ability to run Python and R scripts with relational data. You can use open-source packages and frameworks, and the Microsoft Python and R packages for predictive analytics and machine learning. The scripts are executed in-database without moving data outside SQL Server or over the network. This article explains the basics of SQL Server Machine Learning Services.

In Azure SQL Database, Machine Learning Services is currently in public preview.

Note

For executing Java in SQL Server, see the Language Extensions documentation.

What is Machine Learning Services?

SQL Server Machine Learning Services lets you execute Python and R scripts in-database. You can use it to prepare and clean data, do feature engineering, and train, evaluate, and deploy machine learning models within a database. The feature runs your scripts where the data resides and eliminates transfer of the data across the network to another server.

Base distributions of Python and R are included in Machine Learning Services. You can install and use open-source packages and frameworks, such as PyTorch, TensorFlow, and scikit-learn, in addition to the Microsoft packages revoscalepy and microsoftml for Python, and RevoScaleR, MicrosoftML, olapR, and sqlrutils for R.

Machine Learning Services uses an extensibility framework to run Python and R scripts in SQL Server. Learn more about how this works:

What can I do with Machine Learning Services?

You can use Machine Learning Services to build and training machine learning and deep learning models within SQL Server. You can also deploy existing models to Machine Learning Services and use relational data for predictions.

Examples of the type of predictions that you can use SQL Server Machine Learning Services for, include:

Classification/Categorization Automatically divide customer feedback into positive and negative categories
Regression/Predict continuous values Predict the price of houses based on size and location
Anomaly Detection Detect fraudulent banking transactions
Recommendations Suggest products that online shoppers may want to buy, based on their previous purchases

How to execute Python and R scripts

There are two ways to execute Python and R scripts in Machine Learning Services:

  • The most common way is to use the T-SQL stored procedure sp_execute_external_script.

  • You can also use your preferred Python or R client and write scripts that push the execution (referred to as a remote compute context) to a remote SQL Server. See how to set up a data science client for Python development and R development for more information.

Python and R packages

You can use open-source packages and frameworks, in addition to Microsoft's enterprise packages. Most common open-source Python and R packages are pre-installed in Machine Learning Services. The following Python and R packages from Microsoft are also included:

Language Package Description
Python revoscalepy The primary package for scalable Python. Data transformations and manipulation, statistical summarization, visualization, and many forms of modeling. Additionally, functions in this package automatically distribute workloads across available cores for parallel processing.
Python microsoftml Adds machine learning algorithms to create custom models for text analysis, image analysis, and sentiment analysis.
R RevoScaleR The primary package for scalable R. Data transformations and manipulation, statistical summarization, visualization, and many forms of modeling. Additionally, functions in this package automatically distribute workloads across available cores for parallel processing.
R MicrosoftML (R) Adds machine learning algorithms to create custom models for text analysis, image analysis, and sentiment analysis.
R olapR R functions used for MDX queries against a SQL Server Analysis Services OLAP cube.
R sqlrutils A mechanism to use R scripts in a T-SQL stored procedure, register that stored procedure with a database, and run the stored procedure from an R development environment.
R Microsoft R Open Microsoft R Open (MRO) is the enhanced distribution of R from Microsoft. It is a complete open-source platform for statistical analysis and data science. It is based on and 100% compatible with R, and includes additional capabilities for improved performance and reproducibility.

For more information on which packages are installed with Machine Learning Services and how to install other packages, see:

How do I get started with Machine Learning Services?

  1. Install SQL Server Machine Learning Services

  2. Configure your development tools. You can use:

  3. Write your first Python or R script

Next steps