What's new in Machine Learning Services in SQL Server
In SQL Server 2016, Microsoft introduced SQL Server R Services, a feature that supports enterprise-scale data science, by integrating the R language with the SQL Server database engine.
In SQL Server 2017, database-integrated machine learning became even more powerful, with addition of support for the popular Python language. Along with the support for new languages comes a new name: Machine Learning Services (In-Database).
Catch the latest announcement here! Python in SQL Server 2017: enhanced in-database machine learning
What's new in SQL Server 2017
Machine Learning Server in SQL Server provides comprehensive support for building and deploying machine learning solutions in either R or Python. Here are the highlights of this release:
What's new in Cumulative Update 3 for SQL Server 2017
This release contains updates to Python and R components.
- Added support for Python model serialization in revoscalepy, using the rx_serialize_model function
In-database Python integration
You can run Python in stored procedures, or execute Python remotely using the SQL Server computer as the compute context. This integration opens up new avenues for the vast community of Python developers and data scientists to use the power of SQL Server.
SQL Server developers gain access to the extensive Python libraries from the open source ecosystem, including popular frameworks such as scikit-learn, TensorFlow, Caffe, and Theano/Keras. And be sure to explore innovations from Microsoft such as revoscalepy and microsoftml!
Running Python in-database isn't just about machine learning, by the way. There are a myriad of other potential applications for integrating Python with SQL, and using the power of each language to deliver more intelligent, powerful solutions.
This release includes the final version of revoscalepy, which supplies Python equivalents of the algorithms in RevoScaleR. You can create Python models for linear and logistic regressions, decision trees, boosted trees, and random forests, all parallelizable, and capable of being run in remote compute contexts.
For more information, see What is revoscalepy.
Remote compute contexts for Python
This release supports use of multiple data sources and remote compute contexts. The data scientist or developer can execute Python code on a remote SQL Server, to explore data or build models without moving data. Use of remote compute contexts requires revoscalepy.
Python support in Microsoft Machine Learning Server (Standalone)
SQL Server 2017 includes the option to install a standalone version of the Microsoft Machine Learning Server. By using Machine Learning Server, you can distribute and scale R or Python code without using SQL Server.
Machine learning using R or Python in-database is not currently supported in SQL Server on Linux. Look for announcements in a later release.
However, on Linux you can perform native scoring using the T-SQL PREDICT function. Native scoring lets you score from a pretrained model very fast, without calling or even requiring an R runtime. This means you can use SQL Server on Linux to generate predictions very fast, to serve client applications.
The MicrosoftML package for both R and Python contains state-of-the-art machine learning algorithms and data transformation that can be scaled or run in remote compute contexts. Algorithms include customizable deep neural networks, fast decision trees and decision forests, linear regression, and logistic regression. The MicrosoftML package comes with both R and Python interfaces.
This release contains multiple options and features to help you deploy and distribute machine learning tasks:
Deploy and integrate machine Python solutions, using T-SQL
The integration of Python with T-SQL means that you can call any Python code using
sp_execute_external_script. This secure infrastructure enables enterprise-grade deployment of Python models and scripts that can be called from an application using a simple stored procedure. Additional performance is by streaming data from SQL to Python processes and MPI ring parallelization.
mrsdeploy for Python
The mrsdeploy package for Machine Learning Server and Machine Learning Server supports deployment of Python models and scripts as web services. For an example of how it works, see Publish and consume Python code.
Microsoft has pushed the boundaries of performance for scoring. With in-database scoring, we processed a million rows per second using R models. In this release, new features for realtime scoring and native scoring support better performance in single-row and batch scoring.
Realtime scoring and native scoring
Realtime scoring relies on native C++ libraries to read a model stored in an optimized binary format, and then generate predictions without having to call the R runtime. This makes scoring operations much faster.
Additionally, this release of SQL Server 2017 includes a native T-SQL function for fast scoring that can be run on any edition of SQL Server, even on Linux. The function requires no installation of R or extra configuration. This means you can train a model elsewhere, save it in SQL Server, and then perform scoring without ever calling R. This feature is referred to as native scoring.
- Native scoring is available only in SQL Server 2017. It uses a T-SQL function that can run in any edition of SQL Server, including Linux.
- Realtime scoring is supported in SQL Server 2017, and in Microsoft Machine Learning Server. You can run a stored procedure or perform realtime scoring from R code.
- Realtime scoring is also available for SQL Server 2016, if the instance is upgraded to the latest release of Microsoft R Server.
For more information, see these articles:
Upgrade your machine learning experience and get pre-trained models
If you installed an earlier version of SQL Server 2016 R Services, you can now upgrade to the latest version by switching your server to use the Modern Software Lifecycle policy. By doing so, you can take advantage of a faster release cycle for R and automatically upgrade all R components. For more information, see What's new in Machine Learning Server.
The installer also offers the option to install a collection of pre-trained models in binary format. These models support machine learning in scenarios such as image recognition, where it might be difficult for customers to find large datasets to train a model. After you install one of the pre-trained models, you can use it for prediction on your own data without the time and expense involved in training such a large and complex model.
For more information, see Install pre-trained models in SQL Server
This release includes many improvements in package management for SQL Server. These include:
- Database roles to help the DBA manage packages and assign permissions to install packages
- The CREATE EXTERNAL LIBRARY statement in T-SQL, to help DBAs manage packages without needing to know R
- A rich set of R functions to help install, remove, or list packages owned by users
For more information, see Package management.