Tutorial: R data analytics for SQL developers
In this tutorial for SQL programmers, learn about R integration by building and deploying an R-based machine learning solution using a NYCTaxi_sample database on SQL Server. You'll use T-SQL, SQL Server Management Studio, and a database engine instance with Machine Learning Services and the R language support
This tutorial introduces you to R functions used in a data modeling workflow. Steps include data exploration, building and training a binary classification model, and model deployment. The model you will build predicts whether a trip is likely to result in a tip based on the time of day, distance traveled, and pick-up location.
All of the R code used in this tutorial is wrapped in stored procedures that you create and run in Management Studio.
Background for SQL developers
The process of building a machine learning solution is a complex one that can involve multiple tools, and the coordination of subject matter experts across several phases:
- obtaining and cleaning data
- exploring the data and building features useful for modeling
- training and tuning the model
- deployment to production
Development and testing of the actual code is best performed using a dedicated R development environment. However, after the script is fully tested, you can easily deploy it to SQL Server using Transact-SQL stored procedures in the familiar environment of Management Studio.
The purpose of this multi-part tutorial is an introduction to a typical workflow for migrating "finished R code" to SQL Server.
After the model has been saved to the database, call the model for predictions from Transact-SQL by using stored procedures.
All tasks can be done using Transact-SQL stored procedures in Management Studio.
This tutorial assumes familiarity with basic database operations such as creating databases and tables, importing data, and writing SQL queries. It does not assume you know R. As such, all R code is provided.