Tutorial: In-Database analytics for SQL developers using R

APPLIES TO: yesSQL Server (Windows only) noAzure SQL Database noAzure SQL Data Warehouse noParallel Data Warehouse

In this tutorial for SQL programmers, learn about R integration by building and deploying an R-based machine learning solution using a NYCTaxi_sample database on SQL Server.

This tutorial introduces you to R functions used in a data modeling workflow. Steps include data exploration, building and training a binary classification model, and model deployment. You'll use sample data from the New York City Taxi and Limosine Commission, and the model you will build predicts whether a trip is likely to result in a tip based on the time of day, distance travelled, and pick-up location. All of the R code used in this tutorial is wrapped in stored procedures that you create and run in Management Studio.

Note

This tutorial is available in both R and Python. For the Python version, see In-database analytics for Python developers.

Overview

The process of building a machine learning solution is a complex one that can involve multiple tools, and the coordination of subject matter experts across several phases:

  • obtaining and cleaning data
  • exploring the data and building features useful for modeling
  • training and tuning the model
  • deployment to production

Development and testing of the actual code is best performed using a dedicated development environment. However, after the script is fully tested, you can easily deploy it to SQL Server using Transact-SQL stored procedures in the familiar environment of Management Studio.

Whether you are a SQL programmer new to R, or an R developer new to SQL, this multi-part tutorial is introduces a typical workflow for conducting in-database analytics with R and SQL Server.

After the model has been saved to the database, call the model for predictions from Transact-SQL by using stored procedures.

Prerequisites

All tasks can be done using Transact-SQL stored procedures in Management Studio.

This tutorial assumes familiarity with basic database operations such as creating databases and tables, importing data, and writing SQL queries. It does not assume you know R. As such, all R code is provided.

Next steps