RxInSqlServer: Generate SQL Server In-Database Compute Context

Article
07/12/2022

Description

Creates a compute context for running RevoScaleR analyses inside Microsoft SQL Server.

Currently only supported in Windows.

Usage

  RxInSqlServer(object, connectionString = "",  numTasks = rxGetOption("numTasks"), autoCleanup = TRUE, 
     consoleOutput = FALSE, executionTimeoutSeconds = 0, wait = TRUE, packagesToLoad = NULL, 
     shareDir = NULL, server = NULL, databaseName = NULL, user = NULL, password = NULL,   ...  )

Arguments

`object`

An optional RxInSqlServer object.

`connectionString`

An ODBC connection string used to connect to the Microsoft SQL Server database.

`numTasks`

Number of tasks (processes) to run for each computation. This is the maximum number of tasks that will be used; SQL Server may start fewer processes if there is not enough data, if too many resources are already being used by other jobs, or if numTasks exceeds the MAXDOP (maximum degree of parallelism) configuration option in SQL Server. Each of the tasks is given data in parallel, and does computations in parallel, and so computation time may decrease as numTasks increases. However, that may not always be the case, and computation time may even increase if too many tasks are competing for machine resources. Note that rxOptions(numCoresToUse=n) controls how many cores (actually, threads) are used in parallel within each process, and there is a trade-off between numCoresToUse and numTasks that depends upon the specific algorithm, the type of data, the hardware, and the other jobs that are running.

`wait`

logical value. If TRUE, the job will be blocking and will not return until it has completed or has failed. If FALSE, the job will be non-blocking and return immediately, allowing you to continue running other R code. The object rxgLastPendingJob is created with the job information. The client connection with SQL Server must be maintained while the job is running, even in non-blocking mode.

`consoleOutput`

logical scalar.If TRUE, causes the standard output of the R process started by SQL Server to be printed to the user console. This value may be overwritten by passing a non-NULL logical value to the consoleOutput argument provided in rxExec and rxGetJobResults.

`autoCleanup`

logical scalar. If TRUE, the default behavior is to clean up the temporary computational artifacts and delete the result objects upon retrieval. If FALSE, then the computational results are not deleted, and the results may be acquired using rxGetJobResults, and the output via rxGetJobOutput until the rxCleanupJobs is used to delete the results and other artifacts. Leaving this flag set to FALSE can result in accumulation of compute artifacts which you may eventually need to delete before they fill up your hard drive.

`executionTimeoutSeconds`

numeric scalar. Defaults to 0 which means infinite wait.

`packagesToLoad`

optional character vector specifying additional packages to be loaded on the nodes when jobs are run in this compute context.

`shareDir`

character string specifying the temporary directory on the client that is used to serialize the R objects back and forth. If not specified, a subdirectory under Absolute or paths relative to current directory can be specified.

`server`

Target SQL Server instance. Can also be specified in the connection string with the Server keyword.

`databaseName`

Database on the target SQL Server instance. Can also be specified in the connection string with the Database keyword.

`user`

SQL login to connect to the SQL Server instance. SQL login can also be specified in the connection string with the uid keyword.

`password`

Password associated with the SQL login. Can also be specified in the connection string with the pwd keyword.

`...`

additional arguments to be passed to the underlying function. Two useful additional arguments are traceEnabled=TRUE and traceLevel=7, which taken together enable run-time tracing of your in-SQL Server computations. traceEnabled and traceLevel are deprecated as of MRS 9.0.2 and will be removed from this compute context in the next major release. Please use rxOptions(traceLevel=7) to enable run-time tracing in-SQL Server.

Author(s)

Microsoft Corporation Microsoft Technical Support

Examples


 ## Not run:

# In this example we connect to the database MyDatabase on the SQL Server instance MyServer using MyUser and MyPassword as credentials.
# The query generates a rowset with 1000 rows and 2 columns:
# - The first column is all integers going from 1 to 1000.
# - The second column consists of random integers between 1 and 1000.

# Note: for improved security, read connection string from a file, such as
# connectionString <- readLines("connectionString.txt")

   connectionString <- "Server=MyServer;Database=MyDatabase;UID=MyUser;PWD=MyPassword"
   sqlQuery <- "WITH nb AS (SELECT 0 AS n UNION ALL SELECT n+1 FROM nb where n < 9) SELECT n1.n+10*n2.n+100*n3.n+1 AS n, ABS(CHECKSUM(NewId())) 
   rxSummary(
       formula = ~ .,
       data = RxSqlServerData(sqlQuery = sqlQuery, connectionString = connectionString),
       computeContext = RxInSqlServer(connectionString = connectionString)
       )

# Sample output:
# Number of valid observations: 1000 
# 
#  Name  Mean    StdDev   Min Max  ValidObs MissingObs
#  n     500.500 288.8194 1   1000 1000     0         
#  value 504.582 295.5115 1   1000 1000     0      
  ## End(Not run)