End-to-end data science walkthrough for R and SQL Server

In this walkthrough, you develop an end-to-end solution for predictive modeling based on Microsoft R with SQL Server 2016 or SQL Server 2017.

This walkthrough is based on a popular set of public data, the New York City taxi dataset. You use a combination of R code, SQL Server data, and custom SQL functions to build a classification model that indicates the probability that the driver might get a tip on a particular taxi trip. You also deploy your R model to SQL Server and use server data to generate scores based on the model.

This example can be extended to all kinds of real-life problems, such as predicting customer responses to sales campaigns, or predicting spending or attendance at events. Because the model can be invoked from a stored procedure, you can easily embed it in an application.

Because the walkthrough is designed to introduce R developers to R Services (In-Database), R is used wherever possible. However, this does not mean that R is necessarily the best tool for each task. In many cases, SQL Server might provide better performance, particularly for tasks such as data aggregation and feature engineering. Such tasks can particularly benefit from new features in SQL Server 2017, such as memory optimized columnstore indexes. We try to point out possible optimizations along the way.

Note

The walkthrough was originally developed for and tested on SQL Server 2016. However, screenshots and procedures have been updated to use the latest version of SQL Server Management Studio, which works with SQL Server 2017.

Overview

The estimated times do not include setup. For more information, see Prerequisites for the walkthrough.

Topic list Estimated time
Prepare the R walkthrough data

Obtain the data used for building a model. Download a public dataset and load it into a SQL Server database.
30 minutes
Explore the data using SQL

Understand your data using SQL tools and summaries.
10 minutes
Summarize the data using R

Use R to explore the data and generate summaries.
10 minutes
Create plots using R in SQL Server

Create plots in local and remote compute contexts by mixing R and SQL.
10 minutes
Create data features using R and T-SQL)

Perform feature engineering using custom functions in R and Transact-SQL. Compare the performance of R and T-SQL for featurization tasks.
10 minutes
Build an R model and save it in SQL Server

Train and tune a predictive model. Assess model performance. This walkthrough creates a classification model. Plot the model's accuracy using R.
15 minutes
Deploy the R model using SQL Server

Deploy the model in production by saving the model to a SQL Server database. Call the model from a stored procedure to generate predictions.
10 minutes

Intended audience

This walkthrough is intended for R or SQL developers. It provides an introduction into how R can be integrated into enterprise workflows using R Services (In-Database). You should be familiar with database operations, such as creating databases and tables, importing data, and running queries.

  • All SQL and R scripts are included.
  • You might need to modify strings in the scripts, to run in your environment. You can do this with any code editor, such as Visual Studio Code.

Prerequisites

  • You must have access to an instance of SQL Server 2016, or an evaluation version of SQL Server 2017.
  • At least one instance on the SQL Server computer must have R Services (In-Database) installed.
  • If you want to run R commands from a remote computer, such as a laptop or other networked computer, you must install the Microsoft R Open libraries. You can install either Microsoft R Client or Microsoft R Server. The remote computer must be able to connect to the SQL Server instance.
  • If you need to put client and server on the same computer, be sure to install a separate set of Microsoft R libraries for use in sending R script from a "remote" client. Do not use the R libraries that are installed for use by the SQL Server instance for this purpose.

For details about how to set up these server and client environments, see Prerequisites for R and SQL Server data science walkthrough.

Next lesson

Prepare the R walkthrough data