Interactive R development

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

This article shows how to use R on a compute instance in Azure Machine Learning studio, that runs an R kernel in a Jupyter notebook.

The popular RStudio IDE also works. You can install RStudio or Posit Workbench in a custom container on a compute instance. However, this has limitations in reading and writing to your Azure Machine Learning workspace.

Important

The code shown in this article works on an Azure Machine Learning compute instance. The compute instance has an environment and configuration file necessary for the code to run successfully.

Prerequisites

Run R in a notebook in studio

You'll use a notebook in your Azure Machine Learning workspace, on a compute instance.

  1. Sign in to Azure Machine Learning studio

  2. Open your workspace if it isn't already open

  3. On the left navigation, select Notebooks

  4. Create a new notebook, named RunR.ipynb

    Tip

    If you're not sure how to create and work with notebooks in studio, review Run Jupyter notebooks in your workspace

  5. Select the notebook.

  6. On the notebook toolbar, make sure your compute instance is running. If not, start it now.

  7. On the notebook toolbar, switch the kernel to R.

    Screenshot: Switch the notebook kernel to use R.

Your notebook is now ready to run R commands.

Access data

You can upload files to your workspace file storage resource, and then access those files in R. However, for files stored in Azure data assets or data from datastores, you must install some packages.

This section describes how to use Python and the reticulate package to load your data assets and datastores into R, from an interactive session. You use the azureml-fsspec Python package and the reticulate R package to read tabular data as Pandas DataFrames. This section also includes an example of reading data assets and datastores into an R data.frame.

To install these packages:

  1. Create a new file on the compute instance, called setup.sh.

  2. Copy this code into the file:

    #!/bin/bash
    
    set -e
    
    # Installs azureml-fsspec in default conda environment 
    # Does not need to run as sudo
    
    eval "$(conda shell.bash hook)"
    conda activate azureml_py310_sdkv2
    pip install azureml-fsspec
    conda deactivate
    
    # Checks that version 1.26 of reticulate is installed (needs to be done as sudo)
    
    sudo -u azureuser -i <<'EOF'
    R -e "if (packageVersion('reticulate') >= 1.26) message('Version OK') else install.packages('reticulate')"
    EOF
    
  3. Select Save and run script in terminal to run the script

The install script handles these steps:

  • pip installs azureml-fsspec in the default conda environment for the compute instance
  • Installs the R reticulate package if necessary (version must be 1.26 or greater)

Read tabular data from registered data assets or datastores

For data stored in a data asset created in Azure Machine Learning, use these steps to read that tabular file into a Pandas DataFrame or an R data.frame:

Note

Reading a file with reticulate only works with tabular data.

  1. Ensure you have the correct version of reticulate. For a version less than 1.26, try to use a newer compute instance.

    packageVersion("reticulate")
    
  2. Load reticulate and set the conda environment where azureml-fsspec was installed

    library(reticulate)
    use_condaenv("azureml_py310_sdkv2")
    print("Environment is set")
  3. Find the URI path to the data file.

    1. First, get a handle to your workspace

      py_code <- "from azure.identity import DefaultAzureCredential
      from azure.ai.ml import MLClient
      credential = DefaultAzureCredential()
      ml_client = MLClient.from_config(credential=credential)"
      
      py_run_string(py_code)
      print("ml_client is configured")
    2. Use this code to retrieve the asset. Make sure to replace <MY_NAME> and <MY_VERSION> with the name and number of your data asset.

      Tip

      In studio, select Data in the left navigation to find the name and version number of your data asset.

      # Replace <MY_NAME> and <MY_VERSION> with your values
      py_code <- "my_name = '<MY_NAME>'
      my_version = '<MY_VERSION>'
      data_asset = ml_client.data.get(name=my_name, version=my_version)
      data_uri = data_asset.path"
    3. Run the code to retrieve the URI.

      py_run_string(py_code)
      print(paste("URI path is", py$data_uri))
  4. Use Pandas read functions to read the file(s) into the R environment

    pd <- import("pandas")
    cc <- pd$read_csv(py$data_uri)
    head(cc)

You can also use a Datastore URI to access different files on a registered Datastore, and read these resources into an R data.frame.

  1. In this format, create a Datastore URI, using your own values:

    subscription <- '<subscription_id>'
    resource_group <- '<resource_group>'
    workspace <- '<workspace>'
    datastore_name <- '<datastore>'
    path_on_datastore <- '<path>'
    
    uri <- paste0("azureml://subscriptions/", subscription, "/resourcegroups/", resource_group, "/workspaces/", workspace, "/datastores/", datastore_name, "/paths/", path_on_datastore)
    

    Tip

    Instead of remembering the datastore URI format, you can copy-and-paste the datastore URI from the Studio UI, if you know the datastore where your file is located:

    1. Navigate to the file/folder you want to read into R
    2. Select the elipsis (...) next to it.
    3. Select from the menu Copy URI.
    4. Select the Datastore URI to copy into your notebook/script. Note that you must create a variable for <path> in the code. Screenshot highlighting the copy of the datastore URI.
  2. Create a filestore object using the aforementioned URI:

fs <- azureml.fsspec$AzureMachineLearningFileSystem(uri, sep = "")
  1. Read into an R data.frame:
df <- with(fs$open("<path>)", "r") %as% f, {
 x <- as.character(f$read(), encoding = "utf-8")
 read.csv(textConnection(x), header = TRUE, sep = ",", stringsAsFactors = FALSE)
})
print(df)

Install R packages

A compute instance has many preinstalled R packages.

To install other packages, you must explicitly state the location and dependencies.

Tip

When you create or use a different compute instance, you must re-install any packages you've installed.

For example, to install the tsibble package:

install.packages("tsibble", 
                 dependencies = TRUE,
                 lib = "/home/azureuser")

Note

If you install packages within an R session that runs in a Jupyter notebook, dependencies = TRUE is required. Otherwise, dependent packages will not automatically install. The lib location is also required to install in the correct compute instance location.

Load R libraries

Add /home/azureuser to the R library path.

.libPaths("/home/azureuser")

Tip

You must update the .libPaths in each interactive R script to access user installed libraries. Add this code to the top of each interactive R script or notebook.

Once the libPath is updated, load libraries as usual.

library('tsibble')

Use R in the notebook

Beyond the issues described earlier, use R as you would in any other environment, including your local workstation. In your notebook or script, you can read and write to the path where the notebook/script is stored.

Note

  • From an interactive R session, you can only write to the workspace file system.
  • From an interactive R session, you cannot interact with MLflow (such as log model or query registry).

Next steps