Manage Azure Data Lake Analytics using Azure PowerShell

This article describes how to manage Azure Data Lake Analytics accounts, data sources, users, and jobs by using Azure PowerShell.

Prerequisites

To use PowerShell with Data Lake Analytics, collect the following pieces of information:

  • Subscription ID: The ID of the Azure subscription that contains your Data Lake Analytics account.
  • Resource group: The name of the Azure resource group that contains your Data Lake Analytics account.
  • Data Lake Analytics account name: The name of your Data Lake Analytics account.
  • Default Data Lake Store account name: Each Data Lake Analytics account has a default Data Lake Store account.
  • Location: The location of your Data Lake Analytics account, such as "East US 2" or other supported locations.

The PowerShell snippets in this tutorial use these variables to store this information

$subId = "<SubscriptionId>"
$rg = "<ResourceGroupName>"
$adla = "<DataLakeAnalyticsAccountName>"
$adls = "<DataLakeStoreAccountName>"
$location = "<Location>"

Log in to Azure

Log in using interactive user authentication

Log in using a subscription ID or by subscription name

# Using subscription id
Connect-AzureRmAccount -SubscriptionId $subId

# Using subscription name
Connect-AzureRmAccount -SubscriptionName $subname 

Saving authentication context

The Connect-AzureRmAccount cmdlet always prompts for credentials. You can avoid being prompted by using the following cmdlets:

# Save login session information
Save-AzureRmProfile -Path D:\profile.json  

# Load login session information
Select-AzureRmProfile -Path D:\profile.json 

Log in using a Service Principal Identity (SPI)

$tenantid = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"  
$spi_appname = "appname" 
$spi_appid = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" 
$spi_secret = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" 

$pscredential = New-Object System.Management.Automation.PSCredential ($spi_appid, (ConvertTo-SecureString $spi_secret -AsPlainText -Force))
Login-AzureRmAccount -ServicePrincipal -TenantId $tenantid -Credential $pscredential -Subscription $subid

Manage accounts

List accounts

# List Data Lake Analytics accounts within the current subscription.
Get-AdlAnalyticsAccount

# List Data Lake Analytics accounts within a specific resource group.
Get-AdlAnalyticsAccount -ResourceGroupName $rg

Create an account

Every Data Lake Analytics account requires a default Data Lake Store account that it uses for storing logs. You can reuse an existing account or create an account.

# Create a data lake store if needed, or you can re-use an existing one
New-AdlStore -ResourceGroupName $rg -Name $adls -Location $location
New-AdlAnalyticsAccount -ResourceGroupName $rg -Name $adla -Location $location -DefaultDataLake $adls

Get account information

Get details about an account.

Get-AdlAnalyticsAccount -Name $adla

Check if an account exists

Test-AdlAnalyticsAccount -Name $adla

Manage data sources

Azure Data Lake Analytics currently supports the following data sources:

Every Data Lake Analytics account has a default Data Lake Store account. The default Data Lake Store account is used to store job metadata and job audit logs.

Find the default Data Lake Store account

$adla_acct = Get-AdlAnalyticsAccount -Name $adla
$dataLakeStoreName = $adla_acct.DefaultDataLakeAccount

You can find the default Data Lake Store account by filtering the list of datasources by the IsDefault property:

Get-AdlAnalyticsDataSource -Account $adla  | ? { $_.IsDefault } 

Add a data source


# Add an additional Storage (Blob) account.
$AzureStorageAccountName = "<AzureStorageAccountName>"
$AzureStorageAccountKey = "<AzureStorageAccountKey>"
Add-AdlAnalyticsDataSource -Account $adla -Blob $AzureStorageAccountName -AccessKey $AzureStorageAccountKey

# Add an additional Data Lake Store account.
$AzureDataLakeStoreName = "<AzureDataLakeStoreAccountName"
Add-AdlAnalyticsDataSource -Account $adla -DataLakeStore $AzureDataLakeStoreName 

List data sources

# List all the data sources
Get-AdlAnalyticsDataSource -Account $adla

# List attached Data Lake Store accounts
Get-AdlAnalyticsDataSource -Account $adla | where -Property Type -EQ "DataLakeStore"

# List attached Storage accounts
Get-AdlAnalyticsDataSource -Account $adla | where -Property Type -EQ "Blob"

Submit U-SQL jobs

Submit a string as a U-SQL job

$script = @"
@a  = 
    SELECT * FROM 
        (VALUES
            ("Contoso", 1500.0),
            ("Woodgrove", 2700.0)
        ) AS D( customer, amount );
OUTPUT @a
    TO "/data.csv"
    USING Outputters.Csv();
"@

$scriptpath = "d:\test.usql"
$script | Out-File $scriptpath 

Submit-AdlJob -AccountName $adla -Script $script -Name "Demo"

Submit a file as a U-SQL job

$scriptpath = "d:\test.usql"
$script | Out-File $scriptpath 
Submit-AdlJob -AccountName $adla –ScriptPath $scriptpath -Name "Demo"

List jobs

The output includes the currently running jobs and those jobs that have recently completed.

Get-AdlJob -Account $adla

List the top N jobs

By default the list of jobs is sorted on submit time. So the most recently submitted jobs appear first. By default, The ADLA account remembers jobs for 180 days, but the Get-AdlJob cmdlet by default returns only the first 500. Use -Top parameter to list a specific number of jobs.

$jobs = Get-AdlJob -Account $adla -Top 10

List jobs by job state

Using the -State parameter. You can combine any of these values:

  • Accepted
  • Compiling
  • Ended
  • New
  • Paused
  • Queued
  • Running
  • Scheduling
  • Start
# List the running jobs
Get-AdlJob -Account $adla -State Running

# List the jobs that have completed
Get-AdlJob -Account $adla -State Ended

# List the jobs that have not started yet
Get-AdlJob -Account $adla -State Accepted,Compiling,New,Paused,Scheduling,Start

List jobs by job result

Use the -Result parameter to detect whether ended jobs completed successfully. It has these values:

  • Cancelled
  • Failed
  • None
  • Succeeded
# List Successful jobs.
Get-AdlJob -Account $adla -State Ended -Result Succeeded

# List Failed jobs.
Get-AdlJob -Account $adla -State Ended -Result Failed

List jobs by job submitter

The -Submitter parameter helps you identify who submitted a job.

Get-AdlJob -Account $adla -Submitter "joe@contoso.com"

List jobs by submission time

The -SubmittedAfter is useful in filtering to a time range.

# List  jobs submitted in the last day.
$d = [DateTime]::Now.AddDays(-1)
Get-AdlJob -Account $adla -SubmittedAfter $d

# List  jobs submitted in the last seven day.
$d = [DateTime]::Now.AddDays(-7)
Get-AdlJob -Account $adla -SubmittedAfter $d

Get job status

Get the status of a specific job.

Get-AdlJob -AccountName $adla -JobId $job.JobId

Cancel a job

Stop-AdlJob -Account $adla -JobID $jobID

Wait for a job to finish

Instead of repeating Get-AdlAnalyticsJob until a job finishes, you can use the Wait-AdlJob cmdlet to wait for the job to end.

Wait-AdlJob -Account $adla -JobId $job.JobId

Analyzing job history

Using Azure PowerShell to analyze the history of jobs that have run in Data Lake analytics is a powerful technique. You can use it to gain insights into usage and cost. You can learn more by looking at the Job History Analysis sample repo

List job pipelines and recurrences

Use the Get-AdlJobPipeline cmdlet to see the pipeline information previously submitted jobs.

$pipelines = Get-AdlJobPipeline -Account $adla
$pipeline = Get-AdlJobPipeline -Account $adla -PipelineId "<pipeline ID>"

Use the Get-AdlJobRecurrence cmdlet to see the recurrence information for previously submitted jobs.

$recurrences = Get-AdlJobRecurrence -Account $adla

$recurrence = Get-AdlJobRecurrence -Account $adla -RecurrenceId "<recurrence ID>"

Manage compute policies

List existing compute policies

The Get-AdlAnalyticsComputePolicy cmdlet retrieves info about compute policies for a Data Lake Analytics account.

$policies = Get-AdlAnalyticsComputePolicy -Account $adla

Create a compute policy

The New-AdlAnalyticsComputePolicy cmdlet creates a new compute policy for a Data Lake Analytics account. This example sets the maximum AUs available to the specified user to 50, and the minimum job priority to 250.

$userObjectId = (Get-AzureRmAdUser -SearchString "garymcdaniel@contoso.com").Id

New-AdlAnalyticsComputePolicy -Account $adla -Name "GaryMcDaniel" -ObjectId $objectId -ObjectType User -MaxDegreeOfParallelismPerJob 50 -MinPriorityPerJob 250

Manage files

Check for the existence of a file.

Test-AdlStoreItem -Account $adls -Path "/data.csv"

Uploading and downloading

Upload a file.

Import-AdlStoreItem -AccountName $adls -Path "c:\data.tsv" -Destination "/data_copy.csv" 

Upload an entire folder recursively.

Import-AdlStoreItem -AccountName $adls -Path "c:\myData\" -Destination "/myData/" -Recurse

Download a file.

Export-AdlStoreItem -AccountName $adls -Path "/data.csv" -Destination "c:\data.csv"

Download an entire folder recursively.

Export-AdlStoreItem -AccountName $adls -Path "/" -Destination "c:\myData\" -Recurse

Note

If the upload or download process is interrupted, you can attempt to resume the process by running the cmdlet again with the -Resume flag.

Manage the U-SQL catalog

The U-SQL catalog is used to structure data and code so they can be shared by U-SQL scripts. The catalog enables the highest performance possible with data in Azure Data Lake. For more information, see Use U-SQL catalog.

List items in the U-SQL catalog

# List U-SQL databases
Get-AdlCatalogItem -Account $adla -ItemType Database 

# List tables within a database
Get-AdlCatalogItem -Account $adla -ItemType Table -Path "database"

# List tables within a schema.
Get-AdlCatalogItem -Account $adla -ItemType Table -Path "database.schema"

List all the assemblies the U-SQL catalog

$dbs = Get-AdlCatalogItem -Account $adla -ItemType Database

foreach ($db in $dbs)
{
    $asms = Get-AdlCatalogItem -Account $adla -ItemType Assembly -Path $db.Name

    foreach ($asm in $asms)
    {
        $asmname = "[" + $db.Name + "].[" + $asm.Name + "]"
        Write-Host $asmname
    }
}

Get details about a catalog item

# Get details of a table
Get-AdlCatalogItem  -Account $adla -ItemType Table -Path "master.dbo.mytable"

# Test existence of a U-SQL database.
Test-AdlCatalogItem  -Account $adla -ItemType Database -Path "master"

Store credentials in the catalog

Within a U-SQL database, create a credential object for a database hosted in Azure. Currently, U-SQL credentials are the only type of catalog item that you can create through PowerShell.

$dbName = "master"
$credentialName = "ContosoDbCreds"
$dbUri = "https://contoso.database.windows.net:8080"

New-AdlCatalogCredential -AccountName $adla `
          -DatabaseName $db `
          -CredentialName $credentialName `
          -Credential (Get-Credential) `
          -Uri $dbUri

Manage firewall rules

List firewall rules

Get-AdlAnalyticsFirewallRule -Account $adla

Add a firewall rule

$ruleName = "Allow access from on-prem server"
$startIpAddress = "<start IP address>"
$endIpAddress = "<end IP address>"

Add-AdlAnalyticsFirewallRule -Account $adla -Name $ruleName -StartIpAddress $startIpAddress -EndIpAddress $endIpAddress

Modify a firewall rule

Set-AdlAnalyticsFirewallRule -Account $adla -Name $ruleName -StartIpAddress $startIpAddress -EndIpAddress $endIpAddress

Remove a firewall rule

Remove-AdlAnalyticsFirewallRule -Account $adla -Name $ruleName

Allow Azure IP addresses

Set-AdlAnalyticsAccount -Name $adla -AllowAzureIpState Enabled
Set-AdlAnalyticsAccount -Name $adla -FirewallState Enabled
Set-AdlAnalyticsAccount -Name $adla -FirewallState Disabled

Working with Azure

Get details of AzureRm errors

Resolve-AzureRmError -Last

Verify if you are running as an Administrator on your Windows machine

function Test-Administrator  
{  
    $user = [Security.Principal.WindowsIdentity]::GetCurrent();
    $p = New-Object Security.Principal.WindowsPrincipal $user
    $p.IsInRole([Security.Principal.WindowsBuiltinRole]::Administrator)  
}

Find a TenantID

From a subscription name:

function Get-TenantIdFromSubscriptionName( [string] $subname )
{
    $sub = (Get-AzureRmSubscription -SubscriptionName $subname)
    $sub.TenantId
}

Get-TenantIdFromSubscriptionName "ADLTrainingMS"

From a subscription ID:

function Get-TenantIdFromSubscriptionId( [string] $subid )
{
    $sub = (Get-AzureRmSubscription -SubscriptionId $subid)
    $sub.TenantId
}

$subid = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Get-TenantIdFromSubscriptionId $subid

From a domain address such as "contoso.com"

function Get-TenantIdFromDomain( $domain )
{
    $url = "https://login.windows.net/" + $domain + "/.well-known/openid-configuration"
    return (Invoke-WebRequest $url|ConvertFrom-Json).token_endpoint.Split('/')[3]
}

$domain = "contoso.com"
Get-TenantIdFromDomain $domain

List all your subscriptions and tenant IDs

$subs = Get-AzureRmSubscription
foreach ($sub in $subs)
{
    Write-Host $sub.Name "("  $sub.Id ")"
    Write-Host "`tTenant Id" $sub.TenantId
}

Create a Data Lake Analytics account using a template

You can also use an Azure Resource Group template using the following sample: Create a Data Lake Analytics account using a template

Next steps