Tutorial: Use RevoScaleR R functions with SQL Server data

APPLIES TO: YesSQL Server NoAzure SQL Database NoAzure Synapse Analytics (SQL DW) NoParallel Data Warehouse

In this multi-part tutorial series, you're introduced to a range of RevoScaleR functions for tasks associated with data science. In the process, you'll learn how to create a remote compute context, move data between local and remote compute contexts, and execute R code on a remote SQL Server. You'll also learn how to analyze and plot data both locally and on the remote server, and how to create and deploy models.

RevoScaleR is a Microsoft R package providing distributed and parallel processing for data science and machine learning workloads. For R development in SQL Server, RevoScaleR is one of the core built-in packages, with functions for creating data source objects, setting a compute context, managing packages, and most importantly: working with data end-to-end, from import to visualization and analysis. Machine Learning algorithms in SQL Server have a dependency on RevoScaleR data sources. Given the importance of RevoScaleR, knowing when and how to call its functions is an essential skill.

Prerequisites

To switch back and forth between local and remote compute contexts, you need two systems. Local is typically a development workstation with sufficient power for data science workloads. Remote in this case is SQL Server with the R feature enabled.

Switching compute contexts is predicated on having the same-version RevoScaleR on both local and remote systems. On a local workstation, you can get the RevoScaleR packages and related providers by installing Microsoft R Client.

If you need to put client and server on the same computer, be sure to install a second set of Microsoft R libraries for sending R script from a "remote" client. Do not use the R libraries that are installed in the program files of the SQL Server instance. Specifically, if you are using one computer, you need the RevoScaleR library in both of these locations to support client and server operations.

  • C:\Program Files\Microsoft\R Client\R_SERVER\library\RevoScaleR
  • C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\R_SERVICES\library\RevoScaleR

For instructions on client configuration, see Set up a data science client for R development.

R development tools

R developers typically use IDEs for writing and debugging R code. Here are some suggestions:

  • R Tools for Visual Studio (RTVS) is a free plug-in that provides Intellisense, debugging, and support for Microsoft R. You can use it with both R Server and SQL Server Machine Learning Services. To download, see R Tools for Visual Studio.

  • RStudio is one of the more popular environments for R development. For more information, see https://www.rstudio.com/products/RStudio/.

  • Basic R tools (R.exe, RTerm.exe, RScripts.exe) are also installed by default when you install R in SQL Server or R Client. If you do not wish to install an IDE, you can use built-in R tools to execute the code in this tutorial.

Recall that RevoScaleR is required on both local and remote computers. You cannot complete this tutorial using a generic installation of RStudio or other environment that's missing the Microsoft R libraries. For more information, see Set Up a Data Science Client.

Summary of tasks

  • Data is initially obtained from CSV files or XDF files. You import the data into SQL Server using the functions in the RevoScaleR package.
  • Model training and scoring is performed using the SQL Server compute context.
  • Use RevoScaleR functions to create new SQL Server tables to save your scoring results.
  • Create plots both on the server and in the local compute context.
  • Train a model on data in SQL Server database, running R in the SQL Server instance.
  • Extract a subset of data and save it as an XDF file for re-use in analysis on your local workstation.
  • Get new data for scoring, by opening an ODBC connection to the SQL Server database. Scoring is done on the local workstation.
  • Create a custom R function and run it in the server compute context to perform a simulation.

Next steps