Tutorial: develop U-SQL scripts using Data Lake Tools for Visual Studio

Write and test U-SQL scripts using Data Lake Tools for Visual Studio.

U-SQL is a hyper-scalable, highly extensible language for preparing, transforming and analyzing all data in the data lake and beyond. For more information, see U-SQL Reference.

Prerequisites

Connect to Azure

Connect to Data Lake Analytics

  1. Open Visual Studio.
  2. From the View menu, click Server Explorer to open Server Explorer. Or press [CTRL]+[ALT]+S.
  3. Right-click Azure, click "Connect to Microsoft Azure Subscription", and then follow instructions.
  4. From Server Explorer, expand Azure, and then expand Data Lake Analytics. You shall see a list of your Data Lake Analytics accounts if there are any. You cannot create Data Lake Analytics accounts from Visual Studio. To create an account, see Get Started with Azure Data Lake Analytics using Azure portal or Get Started with Azure Data Lake Analytics using Azure PowerShell.

Upload source data files

You have uploaded some data in the Prerequisite section earlier in the tutorial.

To use your own data, follow these steps for uploading data from the Data Lake Tools.

Upload files to the dependent Azure Data Lake account

  1. From Server Explorer, expand Azure, expand Data Lake Analytics, expand your Data Lake Analytics account, expand Storage Accounts. You shall see the default Data Lake Storage account, and the linked Data Lake Storage accounts, and the linked Azure Storage accounts. The default Data Lake account has a label "Default Storage Account".
  2. Right-click the default Data Lake Storage account, and then click Explorer. It opens the Data Lake Tools for Visual Studio Explorer pane. In the left, it shows a tree view, the content view is on the right.
  3. Browse to the folder where you want to upload files,
  4. Right-click any blank space, and then click Upload.

    U-SQL Visual Studio project U-SQL

Upload files to a linked Azure Blob storage account

  1. From Server Explorer, expand Azure, expand Data Lake Analytics, expand your Data Lake Analytics account, expand Storage Accounts. You shall see the default Data Lake Storage account, and the linked Data Lake Storage accounts, and the linked Azure Storage accounts.
  2. Expand the Azure Storage Account.
  3. Right-click the container where you want to upload files, and then click Explorer. If you don't have a container, you must first create one using the Azure portal, Azure PowerShell, or other tools.
  4. Browse to the folder where you want to upload files,
  5. Right-click any blank space, and then click Upload.

Develop U-SQL scripts

The Data Lake Analytics jobs are written in the U-SQL language. To learn more about U-SQL, see Get started with U-SQL language and U-SQL language reference.

Create and submit a Data Lake Analytics job

  1. From the File menu, click New, and then click Project.
  2. Select the U-SQL Project type.

    new U-SQL Visual Studio project

  3. Click OK. Visual studio creates a solution with a Script.usql file.
  4. Enter the following script into Script.usql:

     @searchlog =
         EXTRACT UserId          int,
                 Start           DateTime,
                 Region          string,
                 Query           string,
                 Duration        int?,
                 Urls            string,
                 ClickedUrls     string
         FROM "/Samples/Data/SearchLog.tsv"
         USING Extractors.Tsv();
    
     @res =
         SELECT *
         FROM @searchlog;        
    
     OUTPUT @res   
         TO "/Output/SearchLog-from-Data-Lake.csv"
     USING Outputters.Csv();
    

    This U-SQL script reads the source data file using Extractors.Tsv(), and then creates a csv file using Outputters.Csv().

    Don't modify the two paths unless you copied the source file into a different location. Data Lake Analytics will create the output folder if it doesn't exist.

    It is simpler to use relative paths for files stored in default data Lake accounts. You can also use absolute paths. For example

     adl://<Data LakeStorageAccountName>.azuredatalakestore.net:443/Samples/Data/SearchLog.tsv
    

    You must use absolute paths to access files in linked Storage accounts. The syntax for files stored in linked Azure Storage account is:

     wasb://<BlobContainerName>@<StorageAccountName>.blob.core.windows.net/Samples/Data/SearchLog.tsv
    
    Note

    Azure Blob container with public blobs or public containers access permissions are not currently supported.

    Notice the following features:

    • IntelliSense

      Name auto completed and the members will be shown for Rowset, Classes, Databases, Schemas and User Defined Objects (UDOs).

      IntelliSense for catalog entities (Databases, Schemas, Tables, UDOs etc.) is related to your compute account. You can check the current active compute account, database and schema in the top toolbar, and switch them through the dropdown lists.

    • Expand * columns

      Click the right of *, you shall see a blue underline beneath the *. Hover your mouse cursor on the blue underline, and then click the down arrow. Data Lake visual studio tools expand *

      Click Expand Columns, the tool will replace the * with the column names.

    • Auto Format

      Users can change the indentation of the U-SQL script based on the code structure under Edit->Advanced:

      • Format Document (Ctrl+E, D) : Formats the whole document
      • Format Selection (Ctrl+K, Ctrl+F): Formats the selection. If no selection has been made, this shortcut formats the line the cursor is in.

        All the formatting rules are configurable under Tools->Options->Text Editor->SIP->Formatting.

    • Smart Indent

      Data Lake Tools for Visual Studio is able to indent expressions automatically while you are writing scripts. This feature is disabled by default, users need to enable it through checking U-SQL->Options and Settings ->Switches->Enable Smart Indent.

    • Go To Definition and Find All References

      Right-clicking the name of a RowSet/parameter/column/UDO etc. and clicking Go To Definition (F12) allows you to navigate to its definition. By clicking Find All References (Shift+F12), will show all the references.

    • Insert Azure Path

      Rather than remembering Azure file path and type it manually when writing script, Data Lake Tools for Visual Studio provides an easy way: right-click in the editor, click Insert Azure Path. Navigate to the file in the Azure Blob Browser dialog. Click OK. the file path will be inserted to your code.

  5. Specify the Data Lake Analytics account, Database, and Schema. You can select (local) to run the script locally for the testing purpose. For more information, see Run U-SQL locally.

    Submit U-SQL Visual Studio project

    For more information, see Use U-SQL catalog.

  6. From Solution Explorer, right-click Script.usql, and then click Build Script. Verify the result in the Output pane.
  7. From Solution Explorer, right-click Script.usql, and then click Submit Script. Optionally, you can also click Submit from Script.usql pane. See the previous screenshot. Click the down arrow next to the Submit button to submit using the advance options:
  8. Specify Job Name, verify the Analytics Account, and then click Submit. Submission results and job link are available in the Data Lake Tools for Visual Studio Results window when the submission is completed.

    Submit U-SQL Visual Studio project

  9. You must click the Refresh button to see the latest job status and refresh the screen. When the job successes, it will show you the Job Graph, Meta Data Operations, State History, Diagnostics:

    U-SQL Visual Studio Data Lake Analytics job performance graph

    • Job Summary. Show the summary information of current job, e.g.: State, Progress, Execution Time, Runtime Name, Submitter etc.
    • Job Details. Detailed information on this job is provided, including script, resource, Vertex Execution View.
    • Job Graph. Four graphs are provided to visualize the job’s information: Progress, Data Read, Data Written, Execution Time, Average Execution Time Per Node, Input Throughput, Output Throughput.
    • Metadata Operations. It shows all the metadata operations.
    • State History.
    • Diagnostics. Data Lake Tools for Visual Studio will diagnose job execution automatically. You will receive alerts when there are some errors or performance issues in their jobs. See Job Diagnostics (link TBD) part for more information.

To check job state

  1. From Server Explorer, expand Azure, expand Data Lake Analytics, expand the Data Lake Analytics account name
  2. Double-click Jobs to list the jobs.
  3. Click a job to see the status.

To see the job output

  1. From Server Explorer, expand Azure, expand Data Lake Analytics, expand your Data Lake Analytics account, expand Storage Accounts, right-click the default Data Lake Storage account, and then click Explorer.
  2. Double-click output to open the folder
  3. Double-click SearchLog-From-adltools.csv.

Job Playback

Job playback enables you to watch job execution progress and visually detect out performance anomalies and bottlenecks. This feature can be used before the job completes execution (i.e. during the time the job is actively running) as well as after the execution has completed. Doing playback during job execution will allow the user to play back the progress up to the current time.

To view job execution progress

  1. Click Load Profile on the upper right corner. See the previous screen shot.
  2. Click on the Play button on the bottom left corner to review the job execution progress.
  3. During the playback, click Pause to stop it or directly drag the progress bar to specific positions.

Heat Map

Data Lake Tools for Visual Studio provides user-selectable color-overlays on job view to indicate progress, data I/O, execution time, I/O throughput of each stage. Through this, users can figure out potential issues and distribution of job properties directly and intuitively. You can choose a data source to display from the drop-down list.

Run U-SQL locally

You can use Azure Data Lake Tools for Visual Studio and the Azure Data Lake U-SQL SDK to run U-SQL jobs on your workstation, just as you can in the Azure Data Lake service. These two local-run features save you time in testing and debugging your U-SQL jobs.

See also

To get started with Data Lake Analytics using different tools, see:

To learn Data Lake Tools for Visual Studio code, see Use the Azure Data Lake Tools for Visual Studio Code.

To see more development topics:

Appx-A PowerShell sample for preparing the tutorial

The following PowerShell script prepares an Azure Data Lake Analytics account and the source data for you, So you can skip to Develop U-SQL scripts.

#region - used for creating Azure service names
$nameToken = "<Enter an alias>"
$namePrefix = $nameToken.ToLower() + (Get-Date -Format "MMdd")
#endregion

#region - service names
$resourceGroupName = $namePrefix + "rg"
$dataLakeStoreName = $namePrefix + "adas"
$dataLakeAnalyticsName = $namePrefix + "adla"
$location = "East US 2"
#endregion


# Treat all errors as terminating
$ErrorActionPreference = "Stop"

#region - Connect to Azure subscription
Write-Host "`nConnecting to your Azure subscription ..." -ForegroundColor Green
try{Get-AzureRmContext}
catch{Login-AzureRmAccount}
#endregion

#region - Create an Azure Data Lake Analytics service account
Write-Host "Create a resource group ..." -ForegroundColor Green
New-AzureRmResourceGroup `
    -Name  $resourceGroupName `
    -Location $location

Write-Host "Create a Data Lake account ..."  -ForegroundColor Green
New-AzureRmDataLakeStoreAccount `
    -ResourceGroupName $resourceGroupName `
    -Name $dataLakeStoreName `
    -Location $location

Write-Host "Create a Data Lake Analytics account ..."  -ForegroundColor Green
New-AzureRmDataLakeAnalyticsAccount `
    -Name $dataLakeAnalyticsName `
    -ResourceGroupName $resourceGroupName `
    -Location $location `
    -DefaultDataLake $dataLakeStoreName

Write-Host "The newly created Data Lake Analytics account ..."  -ForegroundColor Green
Get-AzureRmDataLakeAnalyticsAccount `
    -ResourceGroupName $resourceGroupName `
    -Name $dataLakeAnalyticsName  
#endregion

#region - prepare the source data
Write-Host "Import the source data ..."  -ForegroundColor Green
$localFolder = "C:\Tutorials\Downloads\" # A temp location for the file.
$storageAccount = "adltutorials"  # Don't modify this value.
$container = "adls-sample-data"  #Don't modify this value.

# Create the temp location  
New-Item -Path $localFolder -ItemType Directory -Force

# Download the sample file from Azure Blob storage
$context = New-AzureStorageContext -StorageAccountName $storageAccount -Anonymous
$blobs = Azure\Get-AzureStorageBlob -Container $container -Context $context
$blobs | Get-AzureStorageBlobContent -Context $context -Destination $localFolder

# Upload the file to the default Data Lake Store account    
Import-AzureRmDataLakeStoreItem -AccountName $dataLakeStoreName -Path $localFolder"SearchLog.tsv" -Destination "/Samples/Data/SearchLog.tsv"

Write-Host "List the source data ..."  -ForegroundColor Green
Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path  "/Samples/Data/"
#endregion