RScriptStep class

Definition

Creates an Azure ML Pipeline step that runs R script.

RScriptStep(script_name, name=None, arguments=None, compute_target=None, runconfig=None, runconfig_pipeline_params=None, inputs=None, outputs=None, params=None, source_directory=None, use_gpu=False, custom_docker_image=None, cran_packages=None, github_packages=None, custom_url_packages=None, allow_reuse=True, version=None)
Inheritance
azureml.pipeline.core._python_script_step_base._PythonScriptStepBase
RScriptStep

Parameters

script_name
str

[Required] The name of a R script relative to source_directory.

name
str

The name of the step. If unspecified, script_name is used.

arguments
list

Command line arguments for the R script file. The arguments will be passed to compute via the arguments parameter in RunConfiguration. For more details how to handle arguments such as special symbols, see the RunConfiguration.

compute_target
DsvmCompute or AmlCompute or RemoteCompute or HDInsightCompute or str or tuple

[Required] The compute target to use. If unspecified, the target from the runconfig is used. This parameter may be specified as a compute target object or the string name of a compute target on the workspace. Optionally if the compute target is not available at pipeline creation time, you may specify a tuple of ('compute target name', 'compute target type') to avoid fetching the compute target object (AmlCompute type is 'AmlCompute' and RemoteCompute type is 'VirtualMachine').

runconfig
RunConfiguration

The optional RunConfiguration to use. A RunConfiguration can be used to specify additional requirements for the run, such as conda dependencies and a docker image. If unspecified, a default runconfig will be created.

runconfig_pipeline_params
{str: PipelineParameter}

Overrides of runconfig properties at runtime using key-value pairs each with name of the runconfig property and PipelineParameter for that property.

inputs
list[InputPortBinding or DataReference or PortDataReference or PipelineData or azureml.pipeline.core.pipeline_output_dataset.PipelineOutputDataset or Dataset or DatasetDefinition or DatasetConsumptionConfig or PipelineDataset]

A list of input port bindings.

outputs
list[PipelineData or azureml.pipeline.core.pipeline_output_dataset.PipelineOutputDataset or OutputPortBinding]

A list of output port bindings.

params
dict

A dictionary of name-value pairs registered as environment variables with "AML_PARAMETER_".

source_directory
str

A folder that contains R script, conda env, and other resources used in the step.

use_gpu
bool

Indicates whether the environment to run the experiment should support GPUs. If True, a GPU-based default Docker image will be used in the environment. If False, a CPU-based image will be used. Default docker images (CPU or GPU) will be used only if the custom_docker_image parameter is not set. This setting is used only in Docker-enabled compute targets.

custom_docker_image
str

The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image.

cran_packages
list

CRAN packages to be installed.

github_packages
list

GitHub packages to be installed.

custom_url_packages
list

Packages to be installed from local, directory or custom URL.

allow_reuse
bool

Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed.

version
str

An optional version tag to denote a change in functionality for the step.

Remarks

An RScriptStep is a basic, built-in step to run R script on a compute target. It takes a script name and other optional parameters like arguments for the script, compute target, inputs and outputs. If no compute target is specified, the default compute target for the workspace is used. You can also use a RunConfiguration to specify requirements for the RScriptStep, such as conda dependencies and docker image.

The best practice for working with RScriptStep is to use a separate folder for scripts and any dependent files associated with the step, and specify that folder with the source_directory parameter. Following this best practice has two benefits. First, it helps reduce the size of the snapshot created for the step because only what is needed for the step is snapshotted. Second, the step's output from a previous run can be reused if there are no changes to the source_directory that would trigger a re-upload of the snapshot.

The following code example shows using a RScriptStep in a machine learning training scenario.


   from azureml.pipeline.steps import RScriptStep

   trainStep = RScriptStep(
       script_name="train.R",
       arguments=["--input", blob_input_data, "--output", output_data1],
       inputs=[blob_input_data],
       outputs=[output_data1],
       compute_target=compute_target,
       source_directory=project_folder,
       cran_packages=['ggplot2', 'dplyr']
   )

For more details on creating pipelines in general, see https://aka.ms/pl-first-pipeline.

Methods

create_node(graph, default_datastore, context)

Create a node for RScriptStep and add it to the specified graph.

This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.

create_node(graph, default_datastore, context)

Create a node for RScriptStep and add it to the specified graph.

This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.

create_node(graph, default_datastore, context)

Parameters

graph
Graph

The graph object to add the node to.

default_datastore
AbstractAzureStorageDatastore or AzureDataLakeDatastore

The default datastore.

context
_GraphContext

The graph context.

Returns

The created node.

Return type