Use a Windows VM Managed Service Identity (MSI) to access Azure Data Lake Store

Managed Service Identity (MSI) is a public preview feature of Azure Active Directory. Make sure you review the known issues before you begin. For more information about previews, see Supplemental Terms of Use for Microsoft Azure Previews.

This tutorial shows you how to use a Managed Service Identity (MSI) for a Windows virtual machine (VM) to access an Azure Data Lake Store. Managed Service Identities are automatically managed by Azure and enable you to authenticate to services that support Azure AD authentication, without needing to insert credentials into your code. You learn how to:

  • Enable MSI on a Windows VM
  • Grant your VM access to an Azure Data Lake Store
  • Get an access token using the VM identity and use it to access an Azure Data Lake Store

Prerequisites

If you're unfamiliar with MSI, check out the Managed Service Identity overview. If you don't already have an Azure account, sign up for a free account before continuing.

To perform the required resource creation and role management, your account needs "Owner" permissions at the appropriate scope (your subscription or resource group). If you need assistance with role assignment, see Use Role-Based Access Control to manage access to your Azure subscription resources.

Sign in to Azure

Sign in to the Azure portal at https://portal.azure.com.

Create a Windows virtual machine in a new resource group

For this tutorial, we create a new Windows VM. You can also enable MSI on an existing VM.

  1. Click the Create a resource button found on the upper left-hand corner of the Azure portal.
  2. Select Compute, and then select Windows Server 2016 Datacenter.
  3. Enter the virtual machine information. The Username and Password created here is the credentials you use to login to the virtual machine.
  4. Choose the proper Subscription for the virtual machine in the dropdown.
  5. To select a new Resource Group in which to create your virtual machine, choose Create New. When complete, click OK.
  6. Select the size for the VM. To see more sizes, select View all or change the Supported disk type filter. On the Settings page, keep the defaults, and click OK.

    Alt image text

Enable MSI on your VM

A VM MSI enables you to get access tokens from Azure AD without you needing to put credentials into your code. Enabling MSI tells Azure to create a managed identity for your VM. Under the covers, enabling MSI does two things: registers your VM with Azure Active Directory to create its managed identity, and it configures the identity on the VM.

  1. Select the Virtual Machine that you want to enable MSI on. 
  2. On the left navigation bar click Configuration.
  3. You see Managed Service Identity. To register and enable the MSI, select Yes, if you wish to disable it, choose No.
  4. Ensure you click Save to save the configuration.
    Alt image text

  5. If you wish to check and verify which extensions are on this VM, click Extensions. If MSI is enabled, then ManagedIdentityExtensionforWindows appears in the list.

    Alt image text

Grant your VM access to Azure Data Lake Store

Now you can grant your VM access to files and folders in an Azure Data Lake Store. For this step, you can use an existing Data Lake Store or create a new one. To create a new Data Lake Store using the Azure portal, follow this Azure Data Lake Store quickstart. There are also quickstarts that use the Azure CLI and Azure PowerShell in the Azure Data Lake Store documentation.

In your Data Lake Store, create a new folder and grant your VM MSI permission to read, write, and execute files in that folder:

  1. In the Azure portal, click Data Lake Store in the left-hand navigation.
  2. Click the Data Lake Store you want to use for this tutorial.
  3. Click Data Explorer in the command bar.
  4. The root folder of the Data Lake Store is selected. Click Access in the command bar.
  5. Click Add. In the Select field, enter the name of your VM, for example DevTestVM. Click to select your VM from the search results, then click Select.
  6. Click Select Permissions. Select Read and Execute, add to This folder, and add as An access permission only. Click Ok. The permission should be added successfully.
  7. Close the Access blade.
  8. For this tutorial, create a new folder. Click New Folder in the command bar, and give the new folder a name, for example TestFolder. Click Ok.
  9. Click on the folder you created, then click Access in the command bar.
  10. Similar to step 5, click Add, in the Select field enter the name of your VM, select it and click Select.
  11. Similar to step 6, click Select Permissions, select Read, Write, and Execute, add to This folder, and add as An access permission entry and a default permission entry. Click Ok. The permission should be added successfully.

Your VM MSI can now perform all operations on files in the folder you created. For more information on managing access to Data Lake Store, read this article on Access Control in Data Lake Store.

Get an access token using the VM MSI and use it to call the Azure Data Lake Store filesystem

Azure Data Lake Store natively supports Azure AD authentication, so it can directly accept access tokens obtained using MSI. To authenticate to the Data Lake Store filesystem you send an access token issued by Azure AD to your Data Lake Store filesystem endpoint, in an Authorization header in the format "Bearer <ACCESS_TOKEN_VALUE>". To learn more about Data Lake Store support for Azure AD authentication, read Authentication with Data Lake Store using Azure Active Directory

Note

The Data Lake Store filesystem client SDKs do not yet support Managed Service Identity. This tutorial will be updated when support is added to the SDK.

In this tutorial, you authenticate to the Data Lake Store filesystem REST API using PowerShell to make REST requests. To use the VM MSI for authentication, you need to make the requests from the VM.

  1. In the portal, navigate to Virtual Machines, go to your Windows VM, and in the Overview click Connect.
  2. Enter in your Username and Password for which you added when you created the Windows VM.
  3. Now that you have created a Remote Desktop Connection with the virtual machine, open PowerShell in the remote session.
  4. Using PowerShell’s Invoke-WebRequest, make a request to the local MSI endpoint to get an access token for Azure Data Lake Store. The resource identifier for Data Lake Store is "https://datalake.azure.net/". Data Lake does an exact match on the resource identifier and the trailing slash is important.

    $response = Invoke-WebRequest -Uri 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fdatalake.azure.net%2F' -Method GET -Headers @{Metadata="true"}
    

    Convert the response from a JSON object to a PowerShell object.

    $content = $response.Content | ConvertFrom-Json
    

    Extract the access token from the response.

    $AccessToken = $content.access_token
    
  5. Using PowerShell's `Invoke-WebRequest', make a request to your Data Lake Store's REST endpoint to list the folders in the root folder. This is a simple way to check everything is configured correctly. It is important the string "Bearer" in the Authorization header has a capital "B". You can find the name of your Data Lake Store in the Overview section of the Data Lake Store blade in the Azure portal.

    Invoke-WebRequest -Uri https://<YOUR_ADLS_NAME>.azuredatalakestore.net/webhdfs/v1/?op=LISTSTATUS -Headers @{Authorization="Bearer $AccessToken"}
    

    A successful response looks like:

    StatusCode        : 200
    StatusDescription : OK
    Content           : {"FileStatuses":{"FileStatus":[{"length":0,"pathSuffix":"TestFolder","type":"DIRECTORY", "blockSize":0,"accessTime":1507934941392, "modificationTime":1507944835699,"replication":0, "permission":"770","ow..."
    RawContent        : HTTP/1.1 200 OK
                        Pragma: no-cache
                        x-ms-request-id: b4b31e16-e968-46a1-879a-3474aa7d4528
                        x-ms-webhdfs-version: 17.04.22.00
                        Status: 0x0
                        X-Content-Type-Options: nosniff
                        Strict-Transport-Security: ma...
    Forms             : {}
    Headers           : {[Pragma, no-cache], [x-ms-request-id, b4b31e16-e968-46a1-879a-3474aa7d4528],
                        [x-ms-webhdfs-version, 17.04.22.00], [Status, 0x0]...}
    Images            : {}
    InputFields       : {}
    Links             : {}
    ParsedHtml        : System.__ComObject
    RawContentLength  : 556
    
  6. Now you can try uploading a file to your Data Lake Store. First, create a file to upload.

    echo "Test file." > Test1.txt
    
  7. Using PowerShell's Invoke-WebRequest, make a request to your Data Lake Store's REST endpoint to upload the file to the folder you created earlier. This request takes two steps. In the first step, you make a request and get a redirection to where the file should be uploaded. In the second step, you actually upload the file. Remember to set the name of the folder and file appropriately if you used different values than in this tutorial.

    $HdfsRedirectResponse = Invoke-WebRequest -Uri https://<YOUR_ADLS_NAME>.azuredatalakestore.net/webhdfs/v1/TestFolder/Test1.txt?op=CREATE -Method PUT -Headers @{Authorization="Bearer $AccessToken"} -Infile Test1.txt -MaximumRedirection 0
    

    If you inspect the value of $HdfsRedirectResponse it should look like the following response:

    PS C:\> $HdfsRedirectResponse
    
    StatusCode        : 307
    StatusDescription : Temporary Redirect
    Content           : {}
    RawContent        : HTTP/1.1 307 Temporary Redirect
                        Pragma: no-cache
                        x-ms-request-id: b7ab492f-b514-4483-aada-4aa0611d12b3
                        ContentLength: 0
                        x-ms-webhdfs-version: 17.04.22.00
                        Status: 0x0
                        X-Content-Type-Options: nosn...
    Headers           : {[Pragma, no-cache], [x-ms-request-id, b7ab492f-b514-4483-aada-4aa0611d12b3], 
                        [ContentLength, 0], [x-ms-webhdfs-version, 17.04.22.00]...}
    RawContentLength  : 0
    

    Complete the upload by sending a request to the redirect endpoint:

    Invoke-WebRequest -Uri $HdfsRedirectResponse.Headers.Location -Method PUT -Headers @{Authorization="Bearer $AccessToken"} -Infile Test1.txt -MaximumRedirection 0
    

    A successful response look like:

    StatusCode        : 201
    StatusDescription : Created
    Content           : {}
    RawContent        : HTTP/1.1 201 Created
                        Pragma: no-cache
                        x-ms-request-id: 1e70f36f-ead1-4566-acfa-d0c3ec1e2307
                        ContentLength: 0
                        x-ms-webhdfs-version: 17.04.22.00
                        Status: 0x0
                        X-Content-Type-Options: nosniff
                        Strict...
    Headers           : {[Pragma, no-cache], [x-ms-request-id, 1e70f36f-ead1-4566-acfa-d0c3ec1e2307],
                        [ContentLength, 0], [x-ms-webhdfs-version, 17.04.22.00]...}
    RawContentLength  : 0
    

Using other Data Lake Store filesystem APIs you can append to files, download files, and more.

Congratulations! You've authenticated to the Data Lake Store filesystem using a VM MSI.

Use the following comments section to provide feedback and help us refine and shape our content.