R plugin (Preview)
The R plugin runs a user-defined-function (UDF) using an R script. The script gets tabular data as its input, and produces tabular output. The plugin's runtime is hosted in a sandbox on the cluster's nodes. The sandbox provides an isolated and secure environment.
Syntax
T | evaluate [hint.distribution = (single | per_node)] r(output_schema, script [, script_parameters])
Arguments
- output_schema: A
typeliteral that defines the output schema of the tabular data, returned by the R code.- The format is:
typeof(ColumnName:ColumnType[, ...]), for example:typeof(col1:string, col2:long). - To extend the input schema, use the following syntax:
typeof(*, col1:string, col2:long).
- The format is:
- script: A
stringliteral that is the valid R script to be executed. - script_parameters: An optional
dynamicliteral that is a property bag of name and value pairs to be passed to the R script as the reservedkargsdictionary. For more information, see Reserved R variables. - hint.distribution: An optional hint for the plugin's execution to be distributed across multiple cluster nodes.
Default:
single.single: A single instance of the script will run over the entire query data.per_node: If the query before the R block is distributed, an instance of the script will run on each node over the data that it contains.
Reserved R variables
The following variables are reserved for interaction between Kusto Query Language and the R code:
df: The input tabular data (the values ofTabove), as an R DataFrame.kargs: The value of the script_parameters argument, as an R dictionary.result: An R DataFrame created by the R script. The value becomes the tabular data that gets sent to any Kusto query operator that follows the plugin.
Enable the plugin
- The plugin is disabled by default.
- Enable or disable the plugin in the Azure portal in the Configuration tab of your cluster. For more information see Manage language extensions in your Azure Data Explorer cluster (Preview)
Notes and limitations
- The R sandbox image is based on R 3.4.4 for Windows, and includes packages from Anaconda's R Essentials bundle.
- The R sandbox limits access to the network. The R code can't dynamically install additional packages that aren't included in the image. If you need specific packages, open a New support request in the Azure portal.
Examples
range x from 1 to 360 step 1
| evaluate r(
//
typeof(*, fx:double), // Output schema: append a new fx column to original table
//
'result <- df\n' // The R decorated script
'n <- nrow(df)\n'
'g <- kargs$gain\n'
'f <- kargs$cycles\n'
'result$fx <- g * sin(df$x / n * 2 * pi * f)'
//
, pack('gain', 100, 'cycles', 4) // dictionary of parameters
)
| render linechart
Performance tips
Reduce the plugin's input data set to the minimum amount required (columns/rows).
- Use filters on the source data set using the Kusto Query Language, when possible.
- To make a calculation on a subset of the source columns, project only those columns before invoking the plugin.
Use
hint.distribution = per_nodewhenever the logic in your script is distributable.- You can also use the partition operator for partitioning the input data set.
Whenever possible, use the Kusto Query Language to implement the logic of your R script.
For example:
.show operations | where StartedOn > ago(1d) // Filtering out irrelevant records before invoking the plugin | project d_seconds = Duration / 1s // Projecting only a subset of the necessary columns | evaluate hint.distribution = per_node r( // Using per_node distribution, as the script's logic allows it typeof(*, d2:double), 'result <- df\n' 'result$d2 <- df$d_seconds\n' // Negative example: this logic should have been written using Kusto's query language ) | summarize avg = avg(d2)
Usage tips
To avoid conflicts between Kusto string delimiters and R string delimiters:
- Use single quote characters (
') for Kusto string literals in Kusto queries. - Use double quote characters (
") for R string literals in R scripts.
- Use single quote characters (
Use the external data operator to obtain the content of a script that you've stored in an external location, such as Azure blob storage or a public GitHub repository.
For example:
let script = externaldata(script:string) [h'https://kustoscriptsamples.blob.core.windows.net/samples/R/sample_script.r'] with(format = raw); range x from 1 to 360 step 1 | evaluate r( typeof(*, fx:double), toscalar(script), pack('gain', 100, 'cycles', 4)) | render linechart
This capability isn't supported in Azure Monitor