Run batch endpoints from Event Grid events in storage

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

Event Grid is a fully managed service that enables you to easily manage events across many different Azure services and applications. It simplifies building event-driven and serverless applications. In this tutorial, we learn how to trigger a batch endpoint's job to process files as soon as they are created in a storage account. In this architecture, we use a Logic App to subscribe to those events and trigger the endpoint.

The workflow looks as follows:

Diagram displaying the different components of the architecture.

  1. A file created event is triggered when a new blob is created in a specific storage account.

  2. The event is sent to Event Grid to get processed to all the subscribers.

  3. A Logic App is subscribed to listen to those events. Since the storage account can contain multiple data assets, event filtering will be applied to only react to events happening in a specific folder inside of it. Further filtering can be done if needed (for instance, based on file extensions).

  4. The Logic App will be triggered, which in turns will:

    1. It will get an authorization token to invoke batch endpoints using the credentials from a Service Principal

    2. It will trigger the batch endpoint (default deployment) using the newly created file as input.

  5. The batch endpoint will return the name of the job that was created to process the file.

Important

When using Logic App connected with event grid to invoke batch endpoint, you are generateing one job per each blob file created in the sotrage account. Keep in mind that since batch endpoints distribute the work at the file level, there will not be any parallelization happening. Instead, you will be taking advantage of batch endpoints's capability of executing multiple jobs under the same compute cluster. If you need to run jobs on entire folders in an automatic fashion, we recommend you to switch to Invoking batch endpoints from Azure Data Factory.

Prerequisites

  • This example assumes that you have a model correctly deployed as a batch endpoint. This architecture can perfectly be extended to work with Pipeline component deployments if needed.
  • This example assumes that your batch deployment runs in a compute cluster called batch-cluster.
  • The Logic App we are creating will communicate with Azure Machine Learning batch endpoints using REST. To know more about how to use the REST API of batch endpoints read Create jobs and input data for batch endpoints.

Authenticating against batch endpoints

Azure Logic Apps can invoke the REST APIs of batch endpoints by using the HTTP activity. Batch endpoints support Microsoft Entra ID for authorization and hence the request made to the APIs require a proper authentication handling.

We recommend to using a service principal for authentication and interaction with batch endpoints in this scenario.

  1. Create a service principal following the steps at Register an application with Microsoft Entra ID and create a service principal.

  2. Create a secret to use for authentication as explained at Option 3: Create a new client secret.

  3. Take note of the client secret Value that is generated. This is only displayed once.

  4. Take note of the client ID and the tenant id in the Overview pane of the application.

  5. Grant access for the service principal you created to your workspace as explained at Grant access. In this example the service principal will require:

    1. Permission in the workspace to read batch deployments and perform actions over them.
    2. Permissions to read/write in data stores.

Enabling data access

We will be using cloud URIs provided by Event Grid to indicate the input data to send to the deployment job. Batch endpoints use the identity of the compute to mount the data while keeping the identity of the job to read it once mounted. Hence, we need to assign a user-assigned managed identity to the compute cluster in order to ensure it does have access to mount the underlying data. Follow these steps to ensure data access:

  1. Create a managed identity resource:

    IDENTITY=$(az identity create  -n azureml-cpu-cluster-idn  --query id -o tsv)
    
  2. Update the compute cluster to use the managed identity we created:

    Note

    This examples assumes you have a compute cluster created named cpu-cluster and it is used for the default deployment in the endpoint.

    az ml compute update --name cpu-cluster --identity-type user_assigned --user-assigned-identities $IDENTITY
    
  3. Go to the Azure portal and ensure the managed identity has the right permissions to read the data. To access storage services, you must have at least Storage Blob Data Reader access to the storage account. Only storage account owners can change your access level via the Azure portal.

Create a Logic App

  1. In the Azure portal, sign in with your Azure account.

  2. On the Azure home page, select Create a resource.

  3. On the Azure Marketplace menu, select Integration > Logic App.

    Screenshot that shows Azure Marketplace menu with "Integration" and "Logic App" selected.

  4. On the Create Logic App pane, on the Basics tab, provide the following information about your logic app resource.

    Screenshot showing Azure portal, logic app creation pane, and info for new logic app resource.

    Property Required Value Description
    Subscription Yes <Azure-subscription-name> Your Azure subscription name. This example uses Pay-As-You-Go.
    Resource Group Yes LA-TravelTime-RG The Azure resource group where you create your logic app resource and related resources. This name must be unique across regions and can contain only letters, numbers, hyphens (-), underscores (_), parentheses ((, )), and periods (.).
    Name Yes LA-TravelTime Your logic app resource name, which must be unique across regions and can contain only letters, numbers, hyphens (-), underscores (_), parentheses ((, )), and periods (.).
  5. Before you continue making selections, go to the Plan section. For Plan type, select Consumption to show only the settings for a Consumption logic app workflow, which runs in multi-tenant Azure Logic Apps.

    The Plan type property also specifies the billing model to use.

    Plan type Description
    Standard This logic app type is the default selection and runs in single-tenant Azure Logic Apps and uses the Standard billing model.
    Consumption This logic app type runs in global, multi-tenant Azure Logic Apps and uses the Consumption billing model.

    Important

    For private-link enabled workspaces, you need to use the Standard plan for Logic Apps with allow private networking configuration.

  6. Now continue with the following selections:

    Property Required Value Description
    Region Yes West US The Azure datacenter region for storing your app's information. This example deploys the sample logic app to the West US region in Azure.

    Note: If your subscription is associated with an integration service environment, this list includes those environments.
    Enable log analytics Yes No This option appears and applies only when you select the Consumption logic app type. Change this option only when you want to enable diagnostic logging. For this tutorial, keep the default selection.
  7. When you're done, select Review + create. After Azure validates the information about your logic app resource, select Create.

  8. After Azure deploys your app, select Go to resource.

    Azure opens the workflow template selection pane, which shows an introduction video, commonly used triggers, and workflow template patterns.

  9. Scroll down past the video and common triggers sections to the Templates section, and select Blank Logic App.

    Screenshot that shows the workflow template selection pane with "Blank Logic App" selected.

Configure the workflow parameters

This Logic App uses parameters to store specific pieces of information that you will need to run the batch deployment.

  1. On the workflow designer, under the tool bar, select the option Parameters and configure them as follows:

    Screenshot of all the parameters required in the workflow.

  2. To create a parameter, use the Add parameter option:

    Screenshot showing how to add one parameter in designer.

  3. Create the following parameters.

    Parameter Description Sample value
    tenant_id Tenant ID where the endpoint is deployed. 00000000-0000-0000-00000000
    client_id The client ID of the service principal used to invoke the endpoint. 00000000-0000-0000-00000000
    client_secret The client secret of the service principal used to invoke the endpoint. ABCDEFGhijkLMNOPQRstUVwz
    endpoint_uri The endpoint scoring URI. https://<endpoint_name>.<region>.inference.ml.azure.com/jobs

    Important

    endpoint_uri is the URI of the endpoint you are trying to execute. The endpoint must have a default deployment configured.

    Tip

    Use the values configured at Authenticating against batch endpoints.

Add the trigger

We want to trigger the Logic App each time a new file is created in a given folder (data asset) of a Storage Account. The Logic App uses the information of the event to invoke the batch endpoint and pass the specific file to be processed.

  1. On the workflow designer, under the search box, select Built-in.

  2. In the search box, enter event grid, and select the trigger named When a resource event occurs.

  3. Configure the trigger as follows:

    Property Value Description
    Subscription Your subscription name The subscription where the Azure Storage Account is placed.
    Resource Type Microsoft.Storage.StorageAccounts The resource type emitting the events.
    Resource Name Your storage account name The name of the Storage Account where the files will be generated.
    Event Type Item Microsoft.Storage.BlobCreated The event type.
  4. Click on Add new parameter and select Prefix Filter. Add the value /blobServices/default/containers/<container_name>/blobs/<path_to_data_folder>.

    Important

    Prefix Filter allows Event Grid to only notify the workflow when a blob is created in the specific path we indicated. In this case, we are assumming that files will be created by some external process in the folder <path_to_data_folder> inside the container <container_name> in the selected Storage Account. Configure this parameter to match the location of your data. Otherwise, the event will be fired for any file created at any location of the Storage Account. See Event filtering for Event Grid for more details.

    The trigger will look as follows:

    Screenshot of the trigger activity of the Logic App.

Configure the actions

  1. Click on + New step.

  2. On the workflow designer, under the search box, select Built-in and then click on HTTP:

  3. Configure the action as follows:

    Property Value Notes
    Method POST The HTTP method
    URI concat('https://login.microsoftonline.com/', parameters('tenant_id'), '/oauth2/token') Click on Add dynamic context, then Expression, to enter this expression.
    Headers Content-Type with value application/x-www-form-urlencoded
    Body concat('grant_type=client_credentials&client_id=', parameters('client_id'), '&client_secret=', parameters('client_secret'), '&resource=https://ml.azure.com') Click on Add dynamic context, then Expression, to enter this expression.

    The action will look as follows:

    Screenshot of the authorize activity of the Logic App.

  4. Click on + New step.

  5. On the workflow designer, under the search box, select Built-in and then click on HTTP:

  6. Configure the action as follows:

    Property Value Notes
    Method POST The HTTP method
    URI endpoint_uri Click on Add dynamic context, then select it under parameters.
    Headers Content-Type with value application/json
    Headers Authorization with value concat('Bearer ', body('Authorize')['access_token']) Click on Add dynamic context, then Expression, to enter this expression.
  7. In the parameter Body, click on Add dynamic context, then Expression, to enter the following expression:

    replace('{
     "properties": {
       "InputData": {
         "mnistinput": {
            "JobInputType" : "UriFile",
             "Uri" : "<JOB_INPUT_URI>"
           }
          }
      }
    }', '<JOB_INPUT_URI>', triggerBody()?[0]['data']['url'])
    

    Tip

    The previous payload correspond to a Model deployment. If you are working with a Pipeline component deployment, please adapt the format according to the expectations of the pipeline's inputs. Learn more about how to structure the input in REST calls at Create jobs and input data for batch endpoints (REST).

    The action will look as follows:

    Screenshot of the invoke activity of the Logic App.

    Note

    Notice that this last action will trigger the batch job, but it will not wait for its completion. Azure Logic Apps is not designed for long-running applications. If you need to wait for the job to complete, we recommend you to switch to Run batch endpoints from Azure Data Factory.

  8. Click on Save.

  9. The Logic App is ready to be executed and it will trigger automatically each time a new file is created under the indicated path. You will notice the app has successfully received the event by checking the Run history of it:

    Screenshot of the invoke history of the Logic App.

Next steps