R Tools for Visual Studio (RTVS) takes advantage of Visual Studio's excellent support for SQL Server to make it easier for data scientists to work with SQL databases, namely creating a running SQL queries and working with stored procedures.
To work with SQL and R together, you must have the SQL Server Data Tools installed:
- Visual Studio 2017: run the Visual Studio installer and select the Data storage and processing workload, which includes SQL Server Data tools.
- Visual Studio 2015: follow the instructions on Download SQL Server Data Tools.
The following video (3m 03s) provides a brief overview of SQL Server and R:
Creating and running SQL Queries
RTVS supports adding SQL queries into R projects, allowing you to iteratively develop SQL queries in a separate context until you get the results you're looking for.
To add a SQL query file, right-click the project in Solution Explorer, select Add > New Item..., and select the SQL Query file type:
This opens the file in Visual Studio's Transact-SQL editor, which provides full IntelliSense for SQL and the ability to run queries. For these features to work, though, you need to connect to a database using the connect button in the editor's toolbar or simply trying to run a query (Ctrl+Shift+E, which also works on a selection). Either way brings up the connection dialog:
Once a connection is established, you can run queries and see results:
The Transact-SQL editor supports a variety of other features, such as viewing the execution plan for the query, a query debugger, There are many other features available within the Transact-SQL editor. For details, see Use Transact-SQL Editor to Edit and Execute Scripts.
Working with SQL Server stored procedures
SQL Server R Services (SQL Server 2016 and later) lets you embed and run R code from a T-SQL stored procedure. This means you can run R code SQL Server computer, operate on data returned from a SQL query, and generate a SQL result set that can be processed by further SQL or returned to the client.
RTVS simplifies the otherwise unwieldy and error-prone process of combining SQL and R code inside a single SQL statement, as described in the following sections:
The following video (6m 09s) also provides an overview of these features:
Add a database connection
Select R Tools > Data > Add Database Connection to bring up the Connection Properties dialog, in which you specify the name of the data source (SQL Server in this case), the name of the server, the authentication mode, and the name of the database. You can select Test Connection to verify your input before closing the dialog.
Once you select OK with a valid connection, Visual Studio generates a connection string named
dbConnectionin a new
settings.Rfile. RTVS automatically sources (runs) this file, so you can immediately use the connection from R scripts:
Write and test a SQL stored procedure
To add a new SQL Stored Procedure, right-click your project, select Add > New Item..., select SQL Stored Procedure with R from the list of templates, give the file a name (
StoredProcedure.R in this example), and select OK.
RTVS creates three files for the stored procedure, a
.R file for your R code, a
.Query.sql file for the SQL code, and a
.Template.sql file that combines the two. They latter two appear in Solution Explorer as children of the
StoredProcedure.R (in this example) is where you'll write your R code. The default contents are as follows:
# @InputDataSet: input data frame, result of SQL query execution # @OutputDataSet: data frame to pass back to SQL # Test code # library(RODBC) # channel <- odbcDriverConnect(dbConnection) # InputDataSet <- sqlQuery(channel, ) # odbcClose(channel) OutputDataSet <- InputDataSet
Simply said, the code receives an R dataframe called
InputDataSet and returns its results in
OutputDataSet, with the template code merely copying the input to the output.
The names of these dataframes are controlled by the
@output_data_1_name parameters in the call to the
sp_execute_external_script system stored procedure. For more details on the design of this calling convention and some examples of its usage, sp_execute_external_script (Transact-SQL).
The other generated code in comments is a small test script that uses the RODBC package to transmit a SQL statement to SQL Server, run it, and retrieve its result set as an R dataframe. You can uncomment this test code to interactively write your R code against the result set that you get from SQL Server.
StoredProcedure.Query.sql is where you write and test the SQL query that generates the data for
InputDataSet. Because this is a
.sql file, you have all the Transact-SQL editor features available to you.
Once you are happy with your SQL code, you can easily integrate it with your R code in
StoredProcedure.R by simply dragging the
.sql file onto the open editor for the
.R file. In the image below,
StoredProcedure.Query.sql has been dragged to the point after the comma in
As you can see, this simple step automatically generates R code to open the
.sql file, read its content into a string, and pass it to the RODBC package to send it to SQL Sever.
With this in place, you can now you can interactively write R code that manipulates the
InputDataSet dataframe as desired. Remember that you can just select R code in the editor and send it to the interactive window by pressing Ctrl+Enter.
StoredProcedure.Template.sql, finally, contains the template for generating your SQL Stored Procedure:
CREATE PROCEDURE [StoredProcedure] AS BEGIN EXEC sp_execute_external_script @language = N'R' , @script = N'_RCODE_' , @input_data_1 = N'_INPUT_QUERY_' --- Edit this line to handle the output data frame. WITH RESULT SETS (([MYNEWCOLUMN] NVARCHAR(max))); END;
_RCODE_placeholder is replaced by the contents of
_INPUT_QUERY_placeholder is replaced by the contents of
- You need to describe the schema of the result set returned from the stored procedure by editing the
WITH RESULT SETSclause. Specifically identify the columns from the
OutputDataSetdataframe that you want to return to the caller of the stored procedure.
For example, for the following query:
SELECT TOP 100 medallion, hack_license FROM nyctaxi_sample
You'd use the following
WITH RESULT SETS clause to specify the data types of the return values:
WITH RESULT SETS ((medallion NVARCHAR(max), hack_license NVARCHAR(max)));
Publish a SQL stored procedure
- Select the R Tools > Data > Publish With Options... menu command.
In the dialog that appears, change Publish to: to Database, specify the target, select Publish, and RTVS will build and publish the stored procedure:
To publish all stored procedures in a project, you can also use the R Tools > Data > Publish Stored Procedures command, which is also available when you right-click the project in Solution Explorer.
If you have the SQL Server Object Explorer open in Visual Studio, you'll also see your published stored procedure in the Programmability > Stored Procedures folder of your database. You can also execute it from the Object Explorer by right-clicking and selecting Execute Procedure, or by calling it interactively from a
.sql query window.