Azure Data Lake Analytics task
Applies to: SQL Server (all supported versions) SSIS Integration Runtime in Azure Data Factory
You can submit U-SQL jobs to Azure Data Lake Analytics service with the Data Lake Analytics task. This task is a component of the SQL Server Integration Services (SSIS) feature pack for Azure.
For general background, see Azure Data Lake Analytics.
Configure the task
To add a Data Lake Analytics task to a package, drag the task from SSIS Toolbox to the designer canvas. Then double-click the task, or right-click the task and select Edit. The Azure Data Lake Analytics Task Editor dialog box opens. You can set properties through SSIS Designer, or programmatically.
General page configuration
Use the General page to configure the task and provide the U-SQL script that the task submits. To learn more about the U-SQL language, see U-SQL language reference.
You can specify the name and description of the task.
U-SQL configuration has two settings: SourceType, and dynamic options based on the SourceType value.
SourceType specifies the source of the U-SQL script. The script is submitted to a Data Lake Analytics account during SSIS package execution. The options for this property are:
|DirectInput||Specifies the U-SQL script through the inline editor. Selecting this value displays the dynamic option, USQLStatement.|
|FileConnection||Specifies a local .usql file that contains the U-SQL script. Selecting this option displays the dynamic option, FileConnection.|
|Variable||Specifies an SSIS variable that contains the U-SQL script. Selecting this value displays the dynamic option, SourceVariable.|
SourceType Dynamic Options specifies the script content for the U-SQL query.
|SourceType = DirectInput||Type the U-SQL query to be submitted in the option box directly, or select the browse button (...) to type the U-SQL query in the Enter U-SQL Query dialog box.|
|SourceType = FileConnection||Select an existing file connection manager, or select <New connection...> to create a new file connection. For related information, see File Connection Manager and File Connection Manager Editor.|
|SourceType = Variable||Select an existing variable, or select <New variable...> to create a new variable. For related information, see Integration Services (SSIS) Variables and Add Variable.|
Job configuration specifies U-SQL job submission properties.
AzureDataLakeAnalyticsConnection: Specifies the Data Lake Analytics account where the U-SQL script is submitted. Choose the connection from a list of defined connection managers. To create a new connection, select <New connection>. For related information, see Azure Data Lake Analytics Connection Manager.
JobName: Specifies the name of the U-SQL job.
AnalyticsUnits: Specifies the analytics unit count of the U-SQL job.
Priority: Specifies the priority of the U-SQL job. You can set this from 0 to 1000. The lower the number is, the higher the priority.
RuntimeVersion: Specifies the Data Lake Analytics runtime version of the U-SQL job. It is set to "default" by default. Usually you don't need to change this property.
Synchronous: A Boolean value specifies if the task waits for the job execution to complete or not. If the value is set to true, the task is marked as succeed after the job completes. If the value is set to false, the task is marked as succeed after the job passes the preparation phase.
Value Description True The task result is based on the U-SQL job execution result. Job succeeds > task succeeds. Job fails > task fails. Task succeeds or fails > task completes. False The task result is based on the U-SQL job submission and preparation result. Job submission succeeds and passes the preparation phase > task succeeds. Job submission fails or job fails at the preparation phase > task fails. Task succeeds or fails > task completes.
TimeOut: Specifies a time-out time, in seconds, for job execution. If the job times out, it is cancelled and marked as failed. This property is not available if Synchronous is set to false.
Parameter Mapping page configuration
Use the Parameter Mapping page of the Azure Data Lake Analytics Task Editor dialog box to map variables to parameters (U-SQL variables) in U-SQL script.
Variable Name: After you have added a parameter mapping by selecting Add, select a system or user-defined variable from the list. Alternatively, you can select <New variable...> to add a new variable by using the Add Variable dialog box. For related information, see Integration Services (SSIS) Variables.
Parameter Name: Provide a parameter/variable name in U-SQL script. Make sure the parameter name starts with the @ sign, like @Param1.
Here is an example of how to pass parameters to U-SQL script.
Sample U-SQL script
@searchlog = EXTRACT UserId int, Start DateTime, Region string, Query string, Duration int, Urls string, ClickedUrls string FROM @in USING Extractors.Tsv(nullEscape:"#NULL#"); @rs1 = SELECT Start, Region, Duration FROM @searchlog WHERE Region == "en-gb"; @rs1 = SELECT Start, Region, Duration FROM @rs1 WHERE Start <= DateTime.Parse("2012/02/19"); OUTPUT @rs1 TO @out USING Outputters.Tsv(quoting:false, dateTimeFormat:null);
Note that the input and output paths are defined in @in and @out parameters. The values for @in and @out parameters in the U-SQL script are passed dynamically by the parameter mapping configuration.
|Variable name||Parameter name|
Expression page configuration
You can assign all properties in the General page configuration as a property expression, to enable dynamic update of the property at runtime. For related information, see Use Property Expressions in Packages.