Tutorial: Use RevoScaleR R functions with SQL Server data

APPLIES TO: yesSQL Server (Windows only) noAzure SQL Database noAzure SQL Data Warehouse noParallel Data Warehouse

RevoScaleR is a Microsoft R package providing distributed and parallel processing for data science and machine learning workloads. For R development in SQL Server, RevoScaleR is one of the core built-in packages, with functions for creating data source objects, setting a compute context, managing packages, and most importantly: working with data end-to-end, from import to visualization and analysis. Machine Learning algorithms in SQL Server have a dependency on RevoScaleR data sources. Given the importance of RevoScaleR, knowing when and how to call its functions is an essential skill.

In this multi-part tutorial, you are introduced to a range of RevoScaleR functions for tasks associated with data science. In the process, you will learn how to create a remote compute context, move data between local and remote compute contexts, and execute R code on a remote SQL Server. You also learn how to analyze and plot data both locally and on the remote server, and how to create and deploy models.

Prerequisites

To switch back and forth between local and remote compute contexts, you need two systems. Local is typically a development workstation with sufficent power for data science workloads. Remote in this case is SQL Server 2017 or SQL Server 2016 with the R feature enabled.

Switching compute contexts is predicated on having the same-version RevoScaleR on both local and remote systems. On a local workstation, you can get the RevoScaleR packages and related providers by installing Microsoft R Client.

If you need to put client and server on the same computer, be sure to install a second set of Microsoft R libraries for sending R script from a "remote" client. Do not use the R libraries that are installed in the program files of the SQL Server instance. Specifically, if you are using one computer, you need the RevoScaleR library in both of these locations to support client and server operations.

  • C:\Program Files\Microsoft\R Client\R_SERVER\library\RevoScaleR
  • C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\R_SERVICES\library\RevoScaleR

For instructions on client configuration, see Set up a data science client for R development.

R development tools

R developers typically use IDEs for writing and debugging R code. Here are some suggestions:

  • R Tools for Visual Studio (RTVS) is a free plug-in that provides Intellisense, debugging, and support for Microsoft R. You can use it with both R Server and SQL Server Machine Learning Services. To download, see R Tools for Visual Studio.

  • RStudio is one of the more popular environments for R development. For more information, see https://www.rstudio.com/products/RStudio/.

  • Basic R tools (R.exe, RTerm.exe, RScripts.exe) are also installed by default when you install R in SQL Server or R Client. If you do not wish to install an IDE, you can use built-in R tools to execute the code in this tutorial.

Recall that RevoScaleR is required on both local and remote computers. You cannot complete this tutorial using a generic installation of RStudio or other environment that's missing the Microsoft R libraries. For more information, see Set Up a Data Science Client.

Summary of tasks

  • Data is initially obtained from CSV files or XDF files. You import the data into SQL Server using the functions in the RevoScaleR package.
  • Model training and scoring is performed using the SQL Server compute context.
  • Use RevoScaleR functions to create new SQL Server tables to save your scoring results.
  • Create plots both on the server and in the local compute context.
  • Train a model on data in SQL Server database, running R in the SQL Server instance.
  • Extract a subset of data and save it as an XDF file for re-use in analysis on your local workstation.
  • Get new data for scoring, by opening an ODBC connection to the SQL Server database. Scoring is done on the local workstation.
  • Create a custom R function and run it in the server compute context to perform a simulation.

Next steps