使用 Azure PowerShell 管理 Azure Data Lake AnalyticsManage Azure Data Lake Analytics using Azure PowerShell

本文說明如何使用 Azure PowerShell 來管理 Azure Data Lake Analytics 帳戶、資料來源、使用者和作業。This article describes how to manage Azure Data Lake Analytics accounts, data sources, users, and jobs by using Azure PowerShell.

先決條件Prerequisites

注意

本文已更新為使用新的 Azure PowerShell Az 模組。This article has been updated to use the new Azure PowerShell Az module. AzureRM 模組在至少 2020 年 12 月之前都還會持續收到錯誤 (Bug) 修正,因此您仍然可以持續使用。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要深入了解新的 Az 模組和 AzureRM 的相容性,請參閱新的 Azure PowerShell Az 模組簡介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 如需 Az 模組安裝指示,請參閱安裝 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

若要搭配使用 PowerShell 與 Data Lake Analytics,請收集下列幾項資訊:To use PowerShell with Data Lake Analytics, collect the following pieces of information:

  • 訂用帳戶識別碼:您 Data Lake Analytics 帳戶所屬 Azure 訂用帳戶的識別碼。Subscription ID: The ID of the Azure subscription that contains your Data Lake Analytics account.
  • 资源组:您 Data Lake Analytics 帳戶所屬 Azure 資源群組的名稱。Resource group: The name of the Azure resource group that contains your Data Lake Analytics account.
  • Data Lake Analytics 帳戶名稱:您 Data Lake Analytics 帳戶的名稱。Data Lake Analytics account name: The name of your Data Lake Analytics account.
  • 預設 Data Lake Store 帳戶名稱:每個 Data Lake Analytics 帳戶都有預設的 Data Lake Store 帳戶。Default Data Lake Store account name: Each Data Lake Analytics account has a default Data Lake Store account.
  • 位置:您 Data Lake Analytics 帳戶的位置,例如「美國東部 2」或其他支援的位置。Location: The location of your Data Lake Analytics account, such as "East US 2" or other supported locations.

本教學課程中的 PowerShell 程式碼片段會使用這些變數來儲存此資訊The PowerShell snippets in this tutorial use these variables to store this information

$subId = "<SubscriptionId>"
$rg = "<ResourceGroupName>"
$adla = "<DataLakeAnalyticsAccountName>"
$adls = "<DataLakeStoreAccountName>"
$location = "<Location>"

登入 AzureLog in to Azure

使用互動式使用者驗證來登入Log in using interactive user authentication

使用訂用帳戶識別碼或以訂用帳戶名稱來登入Log in using a subscription ID or by subscription name

# Using subscription id
Connect-AzAccount -SubscriptionId $subId

# Using subscription name
Connect-AzAccount -SubscriptionName $subname 

儲存驗證內容Saving authentication context

Connect-AzAccount Cmdlet 一律會提示輸入認證。The Connect-AzAccount cmdlet always prompts for credentials. 您可以使用下列 Cmdlet 來避免出現提示:You can avoid being prompted by using the following cmdlets:

# Save login session information
Save-AzAccounts -Path D:\profile.json  

# Load login session information
Select-AzAccounts -Path D:\profile.json 

使用服務主體身分識別 (SPI) 來登入Log in using a Service Principal Identity (SPI)

$tenantid = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"  
$spi_appname = "appname" 
$spi_appid = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" 
$spi_secret = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" 

$pscredential = New-Object System.Management.Automation.PSCredential ($spi_appid, (ConvertTo-SecureString $spi_secret -AsPlainText -Force))
Login-AzAccount -ServicePrincipal -TenantId $tenantid -Credential $pscredential -Subscription $subid

管理帳戶Manage accounts

列出帳戶List accounts

# List Data Lake Analytics accounts within the current subscription.
Get-AdlAnalyticsAccount

# List Data Lake Analytics accounts within a specific resource group.
Get-AdlAnalyticsAccount -ResourceGroupName $rg

建立帳戶Create an account

每個 Data Lake Analytics 帳戶都需要預設 Data Lake Store 帳戶,用於儲存記錄。Every Data Lake Analytics account requires a default Data Lake Store account that it uses for storing logs. 您可以重複使用現有的帳戶,或建立一個帳戶。You can reuse an existing account or create an account.

# Create a data lake store if needed, or you can re-use an existing one
New-AdlStore -ResourceGroupName $rg -Name $adls -Location $location
New-AdlAnalyticsAccount -ResourceGroupName $rg -Name $adla -Location $location -DefaultDataLake $adls

取得帳戶資訊Get account information

取得帳戶的相關詳細資料。Get details about an account.

Get-AdlAnalyticsAccount -Name $adla

檢查帳戶是否存在Check if an account exists

Test-AdlAnalyticsAccount -Name $adla

管理資料來源Manage data sources

Azure Data Lake Analytics 目前支援下列資料來源:Azure Data Lake Analytics currently supports the following data sources:

每個 Data Lake Analytics 帳戶都有預設 Data Lake Store 帳戶。Every Data Lake Analytics account has a default Data Lake Store account. 預設的資料湖存放區帳戶是用來儲存工作中繼資料與工作稽核記錄。The default Data Lake Store account is used to store job metadata and job audit logs.

尋找預設的 Data Lake Store 帳戶Find the default Data Lake Store account

$adla_acct = Get-AdlAnalyticsAccount -Name $adla
$dataLakeStoreName = $adla_acct.DefaultDataLakeAccount

您可以用 IsDefault 屬性篩選資料來源清單,藉此方式尋找預設的 Data Lake Store 帳戶:You can find the default Data Lake Store account by filtering the list of datasources by the IsDefault property:

Get-AdlAnalyticsDataSource -Account $adla  | ? { $_.IsDefault } 

建立資料來源Add a data source


# Add an additional Storage (Blob) account.
$AzureStorageAccountName = "<AzureStorageAccountName>"
$AzureStorageAccountKey = "<AzureStorageAccountKey>"
Add-AdlAnalyticsDataSource -Account $adla -Blob $AzureStorageAccountName -AccessKey $AzureStorageAccountKey

# Add an additional Data Lake Store account.
$AzureDataLakeStoreName = "<AzureDataLakeStoreAccountName"
Add-AdlAnalyticsDataSource -Account $adla -DataLakeStore $AzureDataLakeStoreName 

列出資料來源List data sources

# List all the data sources
Get-AdlAnalyticsDataSource -Account $adla

# List attached Data Lake Store accounts
Get-AdlAnalyticsDataSource -Account $adla | where -Property Type -EQ "DataLakeStore"

# List attached Storage accounts
Get-AdlAnalyticsDataSource -Account $adla | where -Property Type -EQ "Blob"

提交 U-SQL 作業Submit U-SQL jobs

以 U-SQL 作業的形式提交字串Submit a string as a U-SQL job

$script = @"
@a  = 
    SELECT * FROM 
        (VALUES
            ("Contoso", 1500.0),
            ("Woodgrove", 2700.0)
        ) AS D( customer, amount );
OUTPUT @a
    TO "/data.csv"
    USING Outputters.Csv();
"@

$scriptpath = "d:\test.usql"
$script | Out-File $scriptpath 

Submit-AdlJob -AccountName $adla -Script $script -Name "Demo"

以 U-SQL 作業的形式提交檔案Submit a file as a U-SQL job

$scriptpath = "d:\test.usql"
$script | Out-File $scriptpath 
Submit-AdlJob -AccountName $adla –ScriptPath $scriptpath -Name "Demo"

列出工作List jobs

輸出包含目前執行中作業和最近完成的作業。The output includes the currently running jobs and those jobs that have recently completed.

Get-AdlJob -Account $adla

列出前 N 個作業List the top N jobs

預設會在提交時排序作業清單。By default the list of jobs is sorted on submit time. 因此,最新提交的作業會顯示在最前面。So the most recently submitted jobs appear first. 根據預設,ADLA 帳戶會記住 180 天內的作業,但 Ge-AdlJob Cmdlet 預設只會傳回前 500 個作業。By default, The ADLA account remembers jobs for 180 days, but the Get-AdlJob cmdlet by default returns only the first 500. 請使用 -Top 參數來列出特定數目的作業。Use -Top parameter to list a specific number of jobs.

$jobs = Get-AdlJob -Account $adla -Top 10

依作業狀態列出作業List jobs by job state

使用 -State 參數。Using the -State parameter. 您可以結合以下這些值:You can combine any of these values:

  • Accepted
  • Compiling
  • Ended
  • New
  • Paused
  • Queued
  • Running
  • Scheduling
  • Start
# List the running jobs
Get-AdlJob -Account $adla -State Running

# List the jobs that have completed
Get-AdlJob -Account $adla -State Ended

# List the jobs that have not started yet
Get-AdlJob -Account $adla -State Accepted,Compiling,New,Paused,Scheduling,Start

依作業結果列出作業List jobs by job result

使用 -Result 參數來偵測已結束的工作是否順利完成。Use the -Result parameter to detect whether ended jobs completed successfully. 它有下列值:It has these values:

  • 已取消Cancelled
  • FailedFailed
  • NoneNone
  • SucceededSucceeded
# List Successful jobs.
Get-AdlJob -Account $adla -State Ended -Result Succeeded

# List Failed jobs.
Get-AdlJob -Account $adla -State Ended -Result Failed

依作業提交者列出作業List jobs by job submitter

-Submitter 參數可協助您識別由誰提交工作。The -Submitter parameter helps you identify who submitted a job.

Get-AdlJob -Account $adla -Submitter "joe@contoso.com"

依提交時間列出作業List jobs by submission time

-SubmittedAfter 用於篩選時間範圍。The -SubmittedAfter is useful in filtering to a time range.

# List  jobs submitted in the last day.
$d = [DateTime]::Now.AddDays(-1)
Get-AdlJob -Account $adla -SubmittedAfter $d

# List  jobs submitted in the last seven day.
$d = [DateTime]::Now.AddDays(-7)
Get-AdlJob -Account $adla -SubmittedAfter $d

取得作業狀態Get job status

取得特定作業的狀態。Get the status of a specific job.

Get-AdlJob -AccountName $adla -JobId $job.JobId

取消工作Cancel a job

Stop-AdlJob -Account $adla -JobID $jobID

等候作業結束Wait for a job to finish

您可以不重複執行 Get-AdlAnalyticsJob 直到作業結束,而是使用 Wait-AdlJob Cmdlet 來等候作業結束。Instead of repeating Get-AdlAnalyticsJob until a job finishes, you can use the Wait-AdlJob cmdlet to wait for the job to end.

Wait-AdlJob -Account $adla -JobId $job.JobId

分析作業記錄Analyzing job history

使用 Azure PowerShell 來分析已在 Data Lake 分析中執行之作業的記錄,是一項強大的技術。Using Azure PowerShell to analyze the history of jobs that have run in Data Lake analytics is a powerful technique. 您可以使用這項技術來深入了解使用情形和成本。You can use it to gain insights into usage and cost. 若要深入了解,請參閱作業記錄分析存放庫範例You can learn more by looking at the Job History Analysis sample repo

列出作業管線和週期List job pipelines and recurrences

使用 Get-AdlJobPipeline Cmdlet 來查看先前提交作業的管線資訊。Use the Get-AdlJobPipeline cmdlet to see the pipeline information previously submitted jobs.

$pipelines = Get-AdlJobPipeline -Account $adla
$pipeline = Get-AdlJobPipeline -Account $adla -PipelineId "<pipeline ID>"

使用 Get-AdlJobRecurrence Cmdlet 來查看先前提交作業的週期資訊。Use the Get-AdlJobRecurrence cmdlet to see the recurrence information for previously submitted jobs.

$recurrences = Get-AdlJobRecurrence -Account $adla

$recurrence = Get-AdlJobRecurrence -Account $adla -RecurrenceId "<recurrence ID>"

管理計算原則Manage compute policies

列出現有的計算原則List existing compute policies

Get-AdlAnalyticsComputePolicy Cmdlet 會擷取 Data Lake Analytics 帳戶的計算原則清單。The Get-AdlAnalyticsComputePolicy cmdlet retrieves info about compute policies for a Data Lake Analytics account.

$policies = Get-AdlAnalyticsComputePolicy -Account $adla

建立計算原則Create a compute policy

New-AdlAnalyticsComputePolicy Cmdlet 會為 Data Lake Analytics 帳戶建立新的計算原則。The New-AdlAnalyticsComputePolicy cmdlet creates a new compute policy for a Data Lake Analytics account. 這個範例會將指定使用者的可用 AU 上限設定為 50,將最低作業優先權設為 250。This example sets the maximum AUs available to the specified user to 50, and the minimum job priority to 250.

$userObjectId = (Get-AzAdUser -SearchString "garymcdaniel@contoso.com").Id

New-AdlAnalyticsComputePolicy -Account $adla -Name "GaryMcDaniel" -ObjectId $objectId -ObjectType User -MaxDegreeOfParallelismPerJob 50 -MinPriorityPerJob 250

管理檔案Manage files

檢查檔案是否存在。Check for the existence of a file.

Test-AdlStoreItem -Account $adls -Path "/data.csv"

上傳和下載Uploading and downloading

上傳檔案。Upload a file.

Import-AdlStoreItem -AccountName $adls -Path "c:\data.tsv" -Destination "/data_copy.csv" 

以遞迴方式上傳整個資料夾。Upload an entire folder recursively.

Import-AdlStoreItem -AccountName $adls -Path "c:\myData\" -Destination "/myData/" -Recurse

下載檔案。Download a file.

Export-AdlStoreItem -AccountName $adls -Path "/data.csv" -Destination "c:\data.csv"

以遞迴方式下載整個資料夾。Download an entire folder recursively.

Export-AdlStoreItem -AccountName $adls -Path "/" -Destination "c:\myData\" -Recurse

注意

如果上傳或下載程序中斷,您可以搭配 -Resume 旗標來重新執行該 Cmdlet,以嘗試繼續該程序。If the upload or download process is interrupted, you can attempt to resume the process by running the cmdlet again with the -Resume flag.

管理 U-SQL 目錄Manage the U-SQL catalog

U-SQL 目錄是用來建構資料和程式碼,讓 U-SQL 指令碼可以共用它們。The U-SQL catalog is used to structure data and code so they can be shared by U-SQL scripts. 目錄可以讓 Azure Data Lake 中的資料具有可能的最高效能。The catalog enables the highest performance possible with data in Azure Data Lake. 如需詳細資訊,請參閱 使用 U-SQL 目錄For more information, see Use U-SQL catalog.

列出 U-SQL 目錄中的項目List items in the U-SQL catalog

# List U-SQL databases
Get-AdlCatalogItem -Account $adla -ItemType Database 

# List tables within a database
Get-AdlCatalogItem -Account $adla -ItemType Table -Path "database"

# List tables within a schema.
Get-AdlCatalogItem -Account $adla -ItemType Table -Path "database.schema"

列出所有組件 U-SQL 目錄List all the assemblies the U-SQL catalog

$dbs = Get-AdlCatalogItem -Account $adla -ItemType Database

foreach ($db in $dbs)
{
    $asms = Get-AdlCatalogItem -Account $adla -ItemType Assembly -Path $db.Name

    foreach ($asm in $asms)
    {
        $asmname = "[" + $db.Name + "].[" + $asm.Name + "]"
        Write-Host $asmname
    }
}

取得目錄項目的相關詳細資料Get details about a catalog item

# Get details of a table
Get-AdlCatalogItem  -Account $adla -ItemType Table -Path "master.dbo.mytable"

# Test existence of a U-SQL database.
Test-AdlCatalogItem  -Account $adla -ItemType Database -Path "master"

將認證儲存在目錄中Store credentials in the catalog

在 U-SQL 資料庫內,為裝載於 Azure 中的資料庫建立認證物件。Within a U-SQL database, create a credential object for a database hosted in Azure. 目前,U-SQL 認證是您可以透過 PowerShell 建立的唯一目錄項目類型。Currently, U-SQL credentials are the only type of catalog item that you can create through PowerShell.

$dbName = "master"
$credentialName = "ContosoDbCreds"
$dbUri = "https://contoso.database.windows.net:8080"

New-AdlCatalogCredential -AccountName $adla `
          -DatabaseName $db `
          -CredentialName $credentialName `
          -Credential (Get-Credential) `
          -Uri $dbUri

管理防火牆規則Manage firewall rules

列出防火牆規則List firewall rules

Get-AdlAnalyticsFirewallRule -Account $adla

新增防火牆規則Add a firewall rule

$ruleName = "Allow access from on-prem server"
$startIpAddress = "<start IP address>"
$endIpAddress = "<end IP address>"

Add-AdlAnalyticsFirewallRule -Account $adla -Name $ruleName -StartIpAddress $startIpAddress -EndIpAddress $endIpAddress

修改防火牆規則Modify a firewall rule

Set-AdlAnalyticsFirewallRule -Account $adla -Name $ruleName -StartIpAddress $startIpAddress -EndIpAddress $endIpAddress

移除防火牆規則Remove a firewall rule

Remove-AdlAnalyticsFirewallRule -Account $adla -Name $ruleName

允許 Azure IP 位址Allow Azure IP addresses

Set-AdlAnalyticsAccount -Name $adla -AllowAzureIpState Enabled
Set-AdlAnalyticsAccount -Name $adla -FirewallState Enabled
Set-AdlAnalyticsAccount -Name $adla -FirewallState Disabled

使用 AzureWorking with Azure

取得錯誤詳細資料Get error details

Resolve-AzError -Last

確認您在 Windows 機器上是否以系統管理員的身分執行Verify if you are running as an Administrator on your Windows machine

function Test-Administrator  
{  
    $user = [Security.Principal.WindowsIdentity]::GetCurrent();
    $p = New-Object Security.Principal.WindowsPrincipal $user
    $p.IsInRole([Security.Principal.WindowsBuiltinRole]::Administrator)  
}

尋找 TenantIDFind a TenantID

從訂用帳戶名稱:From a subscription name:

function Get-TenantIdFromSubscriptionName( [string] $subname )
{
    $sub = (Get-AzSubscription -SubscriptionName $subname)
    $sub.TenantId
}

Get-TenantIdFromSubscriptionName "ADLTrainingMS"

從訂用帳戶識別碼:From a subscription ID:

function Get-TenantIdFromSubscriptionId( [string] $subid )
{
    $sub = (Get-AzSubscription -SubscriptionId $subid)
    $sub.TenantId
}

$subid = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Get-TenantIdFromSubscriptionId $subid

從網域位址 (例如 "contoso.com")From a domain address such as "contoso.com"

function Get-TenantIdFromDomain( $domain )
{
    $url = "https://login.windows.net/" + $domain + "/.well-known/openid-configuration"
    return (Invoke-WebRequest $url|ConvertFrom-Json).token_endpoint.Split('/')[3]
}

$domain = "contoso.com"
Get-TenantIdFromDomain $domain

列出您的所有訂用帳戶和租用戶識別碼List all your subscriptions and tenant IDs

$subs = Get-AzSubscription
foreach ($sub in $subs)
{
    Write-Host $sub.Name "("  $sub.Id ")"
    Write-Host "`tTenant Id" $sub.TenantId
}

使用範本建立 Data Lake Analytics 帳戶Create a Data Lake Analytics account using a template

您也可以運用下列範例來使用「Azure 資源群組」範本:使用範本來建立 Data Lake Analytics 帳戶 (英文)You can also use an Azure Resource Group template using the following sample: Create a Data Lake Analytics account using a template

後續步驟Next steps