Create a trigger that runs a pipeline in response to a storage event

APPLIES TO: Azure Data Factory Azure Synapse Analytics

This article describes the Storage Event Triggers that you can create in your Data Factory or Synapse pipelines.

Event-driven architecture (EDA) is a common data integration pattern that involves production, detection, consumption, and reaction to events. Data integration scenarios often require customers to trigger pipelines based on events happening in storage account, such as the arrival or deletion of a file in Azure Blob Storage account. Data Factory and Synapse pipelines natively integrate with Azure Event Grid, which lets you trigger pipelines on such events.

For a ten-minute introduction and demonstration of this feature, watch the following video:

Note

The integration described in this article depends on Azure Event Grid. Make sure that your subscription is registered with the Event Grid resource provider. For more info, see Resource providers and types. You must be able to do the Microsoft.EventGrid/eventSubscriptions/* action. This action is part of the EventGrid EventSubscription Contributor built-in role.

Note

If the blob storage account resides behind a private endpoint and blocks public network access, you need to configure network rules to allow communications from blob storage to Azure Event Grid. You can either grant storage access to trusted Azure services, such as Event Grid, following Storage documentation, or configure private endpoints for Event Grid that map to VNet address space, following Event Grid documentation

Create a trigger with UI

This section shows you how to create a storage event trigger within the Azure Data Factory and Synapse pipeline User Interface.

  1. Switch to the Edit tab in Data Factory, or the Integrate tab in Azure Synapse.

  2. Select Trigger on the menu, then select New/Edit.

  3. On the Add Triggers page, select Choose trigger..., then select +New.

  4. Select trigger type Storage Event

  5. Select your storage account from the Azure subscription dropdown or manually using its Storage account resource ID. Choose which container you wish the events to occur on. Container selection is required, but be mindful that selecting all containers can lead to a large number of events.

    Note

    The Storage Event Trigger currently supports only Azure Data Lake Storage Gen2 and General-purpose version 2 storage accounts. Due to an Azure Event Grid limitation, Azure Data Factory only supports a maximum of 500 storage event triggers per storage account. If you hit the limit, please contact support for recommendations and increasing the limit upon evaluation by Event Grid team.

    Note

    To create a new or modify an existing Storage Event Trigger, the Azure account used to log into the service and publish the storage event trigger must have appropriate role based access control (Azure RBAC) permission on the storage account. No additional permission is required: Service Principal for the Azure Data Factory and Azure Synapse does not need special permission to either the Storage account or Event Grid. For more information about access control, see Role based access control section.

  6. The Blob path begins with and Blob path ends with properties allow you to specify the containers, folders, and blob names for which you want to receive events. Your storage event trigger requires at least one of these properties to be defined. You can use variety of patterns for both Blob path begins with and Blob path ends with properties, as shown in the examples later in this article.

    • Blob path begins with: The blob path must start with a folder path. Valid values include 2018/ and 2018/april/shoes.csv. This field can't be selected if a container isn't selected.
    • Blob path ends with: The blob path must end with a file name or extension. Valid values include shoes.csv and .csv. Container and folder names, when specified, they must be separated by a /blobs/ segment. For example, a container named 'orders' can have a value of /orders/blobs/2018/april/shoes.csv. To specify a folder in any container, omit the leading '/' character. For example, april/shoes.csv will trigger an event on any file named shoes.csv in folder a called 'april' in any container.
    • Note that Blob path begins with and ends with are the only pattern matching allowed in Storage Event Trigger. Other types of wildcard matching aren't supported for the trigger type.
  7. Select whether your trigger will respond to a Blob created event, Blob deleted event, or both. In your specified storage location, each event will trigger the Data Factory and Synapse pipelines associated with the trigger.

    Screenshot of storage event trigger creation page.

  8. Select whether or not your trigger ignores blobs with zero bytes.

  9. After you configure you trigger, click on Next: Data preview. This screen shows the existing blobs matched by your storage event trigger configuration. Make sure you've specific filters. Configuring filters that are too broad can match a large number of files created/deleted and may significantly impact your cost. Once your filter conditions have been verified, click Finish.

    Screenshot of storage event trigger preview page.

  10. To attach a pipeline to this trigger, go to the pipeline canvas and click Trigger and select New/Edit. When the side nav appears, click on the Choose trigger... dropdown and select the trigger you created. Click Next: Data preview to confirm the configuration is correct and then Next to validate the Data preview is correct.

  11. If your pipeline has parameters, you can specify them on the trigger runs parameter side nav. The storage event trigger captures the folder path and file name of the blob into the properties @triggerBody().folderPath and @triggerBody().fileName. To use the values of these properties in a pipeline, you must map the properties to pipeline parameters. After mapping the properties to parameters, you can access the values captured by the trigger through the @pipeline().parameters.parameterName expression throughout the pipeline. For detailed explanation, see Reference Trigger Metadata in Pipelines

    Screenshot of storage event trigger mapping properties to pipeline parameters.

    In the preceding example, the trigger is configured to fire when a blob path ending in .csv is created in the folder event-testing in the container sample-data. The folderPath and fileName properties capture the location of the new blob. For example, when MoviesDB.csv is added to the path sample-data/event-testing, @triggerBody().folderPath has a value of sample-data/event-testing and @triggerBody().fileName has a value of moviesDB.csv. These values are mapped, in the example, to the pipeline parameters sourceFolder and sourceFile, which can be used throughout the pipeline as @pipeline().parameters.sourceFolder and @pipeline().parameters.sourceFile respectively.

    Note

    If you are creating your pipeline and trigger in Azure Synapse Analytics, you must use @trigger().outputs.body.fileName and @trigger().outputs.body.folderPath as parameters. Those two properties capture blob information. Use those properties instead of using @triggerBody().fileName and @triggerBody().folderPath.

  12. Click Finish once you are done.

JSON schema

The following table provides an overview of the schema elements that are related to storage event triggers:

JSON Element Description Type Allowed Values Required
scope The Azure Resource Manager resource ID of the Storage Account. String Azure Resource Manager ID Yes
events The type of events that cause this trigger to fire. Array Microsoft.Storage.BlobCreated, Microsoft.Storage.BlobDeleted Yes, any combination of these values.
blobPathBeginsWith The blob path must begin with the pattern provided for the trigger to fire. For example, /records/blobs/december/ only fires the trigger for blobs in the december folder under the records container. String Provide a value for at least one of these properties: blobPathBeginsWith or blobPathEndsWith.
blobPathEndsWith The blob path must end with the pattern provided for the trigger to fire. For example, december/boxes.csv only fires the trigger for blobs named boxes in a december folder. String Provide a value for at least one of these properties: blobPathBeginsWith or blobPathEndsWith.
ignoreEmptyBlobs Whether or not zero-byte blobs will trigger a pipeline run. By default, this is set to true. Boolean true or false No

Examples of storage event triggers

This section provides examples of storage event trigger settings.

Important

You have to include the /blobs/ segment of the path, as shown in the following examples, whenever you specify container and folder, container and file, or container, folder, and file. For blobPathBeginsWith, the UI will automatically add /blobs/ between the folder and container name in the trigger JSON.

Property Example Description
Blob path begins with /containername/ Receives events for any blob in the container.
Blob path begins with /containername/blobs/foldername/ Receives events for any blobs in the containername container and foldername folder.
Blob path begins with /containername/blobs/foldername/subfoldername/ You can also reference a subfolder.
Blob path begins with /containername/blobs/foldername/file.txt Receives events for a blob named file.txt in the foldername folder under the containername container.
Blob path ends with file.txt Receives events for a blob named file.txt in any path.
Blob path ends with /containername/blobs/file.txt Receives events for a blob named file.txt under container containername.
Blob path ends with foldername/file.txt Receives events for a blob named file.txt in foldername folder under any container.

Role-based access control

Azure Data Factory and Synapse pipelines use Azure role-based access control (Azure RBAC) to ensure that unauthorized access to listen to, subscribe to updates from, and trigger pipelines linked to blob events, are strictly prohibited.

  • To successfully create a new or update an existing Storage Event Trigger, the Azure account signed into the service needs to have appropriate access to the relevant storage account. Otherwise, the operation will fail with Access Denied.
  • Azure Data Factory and Azure Synapse need no special permission to your Event Grid, and you do not need to assign special RBAC permission to the Data Factory or Azure Synapse service principal for the operation.

Any of following RBAC settings works for storage event trigger:

  • Owner role to the storage account
  • Contributor role to the storage account
  • Microsoft.EventGrid/EventSubscriptions/Write permission to storage account /subscriptions/####/resourceGroups/####/providers/Microsoft.Storage/storageAccounts/storageAccountName

In order to understand how the service delivers the two promises, let's take back a step and take a peek behind the scenes. Here are the high-level work flows for integration between Azure Data Factory/Azure Synapse, Storage, and Event Grid.

Create a new Storage Event Trigger

This high-level work flow describes how Azure Data Factory interacts with Event Grid to create a Storage Event Trigger. For Azure Synapse the data flow is the same, with Synapse pipelines taking the role of the Data Factory in the diagram below.

Workflow of storage event trigger creation.

Two noticeable call outs from the work flows:

  • Azure Data Factory and Azure Synapse make no direct contact with Storage account. Request to create a subscription is instead relayed and processed by Event Grid. Hence, the service needs no permission to Storage account for this step.

  • Access control and permission checking happen within the service. Before the service sends a request to subscribe to storage event, it checks the permission for the user. More specifically, it checks whether the Azure account signed in and attempting to create the Storage Event trigger has appropriate access to the relevant storage account. If the permission check fails, trigger creation also fails.

Storage event trigger pipeline run

This high-level work flows describe how Storage event triggers pipeline run through Event Grid. For Azure Synapse the data flow is the same, with Synapse pipelines taking the role of the Data Factory in the diagram below.

Workflow of storage event triggering pipeline runs.

There are three noticeable call outs in the workflow related to Event triggering pipelines within the service:

  • Event Grid uses a Push model that relays the message as soon as possible when storage drops the message into the system. This is different from messaging system, such as Kafka where a Pull system is used.

  • Event Trigger serves as an active listener to the incoming message and it properly triggers the associated pipeline.

  • Storage Event Trigger itself makes no direct contact with Storage account

    • That said, if you have a Copy or other activity inside the pipeline to process the data in Storage account, the service will make direct contact with Storage, using the credentials stored in the Linked Service. Ensure that Linked Service is set up appropriately
    • However, if you make no reference to the Storage account in the pipeline, you do not need to grant permission to the service to access Storage account

Next steps