Ingest blobs into Azure Data Explorer by subscribing to Event Grid notifications

Azure Data Explorer is a fast and highly scalable data exploration service for log and telemetry data. Azure Data Explorer offers ingestion (data loading) from event hubs, IoT hubs, and blobs written to blob containers.

In this article, you learn how to ingest blobs from your storage account into Azure Data Explorer using an Event Grid data connection. You'll create an Event Grid data connection that sets an Azure Event Grid subscription. The Event Grid subscription routes events from your storage account to Azure Data Explorer via an Azure Event Hubs. Then you'll see an example of the data flow throughout the system.

For general information about ingesting into Azure Data Explorer from Event Grid, see Connect to Event Grid. To create resources manually in the Azure portal, see Manually create resources for Event Grid ingestion.

Prerequisites

Create a target table in Azure Data Explorer

Create a table in Azure Data Explorer where Azure Event Hubs will send data. Create the table in the cluster and database prepared in the prerequisites.

  1. In the Azure portal, under your cluster, select Query.

    Screenshot of the Azure portal Query page, showing a selected database.

  2. Copy the following command into the window and select Run to create the table (TestTable) that will receive the ingested data.

    .create table TestTable (TimeStamp: datetime, Value: string, Source:string)
    

    Screenshot of the Azure Data Explorer Web U I Query page, showing the create table command.

  3. Copy the following command into the window and select Run to map the incoming JSON data to the column names and data types of the table (TestTable).

    .create table TestTable ingestion json mapping 'TestMapping' '[{"column":"TimeStamp","path":"$.TimeStamp"},{"column":"Value","path":"$.Value"},{"column":"Source","path":"$.Source"}]'
    

Create an Event Grid data connection

Now connect the storage account to Azure Data Explorer, so that data flowing into the storage is streamed to the test table. This connection can be created in the Azure portal under the storage account itself, or in the Azure portal under Azure Data Explorer.

  1. Under the cluster you created, select Databases > TestDatabase.

    Screenshot of the Databases page, showing a database.

  2. Select Data ingestion > Add data connection.

    Screenshot of the Data ingestion page, showing the add data connection option.

  3. Under Basics, select the connection type: Blob storage and then fill out the form with the following information:

    Screenshot of the Data Connection Basics tab, showing the options for the Blob storage connection type.

    Setting Suggested value Field description
    Data connection name test-grid-connection The name of the connection that you want to create in Azure Data Explorer.
    Storage account subscription Your subscription ID The subscription ID where your storage account is.
    Storage account gridteststorage1 The name of the storage account that you created previously.
    Event type Blob created or Blob renamed The type of event that triggers ingestion. Blob renamed is supported only for ADLSv2 storage. Supported types are: Microsoft.Storage.BlobCreated or Microsoft.Storage.BlobRenamed.
    Resources creation Automatic Define whether you want Azure Data Explorer to create an Event Grid Subscription, an Event Hubs namespace, and an Event Hubs for you. To create resources manually, see Manually create resources for Event Grid ingestion
  4. Select Filter settings if you want to track specific subjects. Set the filters for the notifications as follows:

    • Prefix field is the literal prefix of the subject. As the pattern applied is startswith, it can span multiple containers, folders, or blobs. No wildcards are allowed.
      • To define a filter on the blob container, the field must be set as follows: /blobServices/default/containers/[container prefix].
      • To define a filter on a blob prefix (or a folder in Azure Data Lake Gen2), the field must be set as follows: /blobServices/default/containers/[container name]/blobs/[folder/blob prefix].
    • Suffix field is the literal suffix of the blob. No wildcards are allowed.
    • Case-Sensitive field indicates whether the prefix and suffix filters are case-sensitive
    • For more information about filtering events, see Blob storage events.

    Screenshot of the Filter settings form, showing the filter parameters.

  5. Select Next: Ingest properties.

  6. Fill out the form with the following information. Table and mapping names are case-sensitive:

    Screenshot of the Data Connection Ingest properties tab, showing the target table properties.

    Ingest properties:

    Setting Suggested value Field description
    Allow routing the data to other databases (Multi database data connection) Don't allow Turn on this option if you want to override the default target database associated with the data connection. For more information about database routing, see Events routing.
    Table name TestTable The table you created in TestDatabase.
    Data format JSON Supported formats are Avro, CSV, JSON, MULTILINE JSON, ORC, PARQUET, PSV, SCSV, SOHSV, TSV, TXT, TSVE, APACHEAVRO, RAW, and W3CLOG. Supported compression options are Zip and Gzip.
    Mapping TestMapping The mapping you created in TestDatabase, which maps incoming data to the column names and data types of TestTable. If not specified, an identity data mapping derived from the table's schema is used.
    Advanced settings My data has headers Ignores headers. Supported for *SV type files.

    Note

    You don't have to specify all Default routing settings. Partial settings are also accepted.

  7. Select Next: Review + create

  8. Under *Review + create.

  9. Review the resources that were auto created for you and select Create.

    Screenshot of the Data Connection Review and create tab, showing a summary of the selected data connection settings.

Deployment

Wait until the deployment is completed. If your deployment failed, select Operation details next to the failed stage to get more information for the failure reason. Select Redeploy to try to deploy the resources again. You can alter the parameters before deployment.

Screenshot of Deploy Event Grid overview page, showing a failed deployment.

Generate sample data

Now that Azure Data Explorer and the storage account are connected, you can create sample data.

Upload blob to the storage container

We'll work with a small shell script that issues a few basic Azure CLI commands to interact with Azure Storage resources. This script does the following actions:

  1. Creates a new container in your storage account.
  2. Uploads an existing file (as a blob) to that container.
  3. Lists the blobs in the container.

You can use Azure Cloud Shell to execute the script directly in the portal.

Save the data into a file and upload it with this script:

{"TimeStamp": "1987-11-16 12:00","Value": "Hello World","Source": "TestSource"}
#!/bin/bash
### A simple Azure Storage example script

    export AZURE_STORAGE_ACCOUNT=<storage_account_name>
    export AZURE_STORAGE_KEY=<storage_account_key>

    export container_name=<container_name>
    export blob_name=<blob_name>
    export file_to_upload=<file_to_upload>
    export destination_file=<destination_file>

    echo "Creating the container..."
    az storage container create --name $container_name

    echo "Uploading the file..."
    az storage blob upload --container-name $container_name --file $file_to_upload --name $blob_name --metadata "rawSizeBytes=1024"

    echo "Listing the blobs..."
    az storage blob list --container-name $container_name --output table

    echo "Done"

Note

To achieve the best ingestion performance, the uncompressed size of the compressed blobs submitted for ingestion must be communicated. Because Event Grid notifications contain only basic details, the size information must be explicitly communicated. The uncompressed size information can be provided by setting the rawSizeBytes property on the blob metadata with the uncompressed data size in bytes.

Rename blob

If you're ingesting data from ADLSv2 storage and have defined Blob renamed as the event type for the data connection, the trigger for blob ingestion is blob renaming. To rename a blob, navigate to the blob in Azure portal, right-click on the blob and select Rename:

Screenshot of a blob shortcut menu, showing the Rename option.

Ingestion properties

You can specify the ingestion properties of the blob ingestion via the blob metadata.

Note

Azure Data Explorer won't delete the blobs post ingestion. Retain the blobs for three to five days. Use Azure Blob storage lifecycle to manage blob deletion.

Review the data flow

Note

Azure Data Explorer has an aggregation (batching) policy for data ingestion designed to optimize the ingestion process. By default, the policy is configured to 5 minutes. You'll be able to alter the policy at a later time if needed. In this article you can expect a latency of a few minutes.

  1. In the Azure portal, under your event grid, you see the spike in activity while the app is running.

    Screenshot of an Event Grid Activity graph, showing a spike in activity.

  2. To preview incoming Event Grid notifications sent to your event hub in the Azure portal, see process data from your event hub using Azure Stream Analytics. You can use the insights you gain to refine your Event Grid filter settings and ensure that only the appropriate events are sent to your event hub and cluster.

  3. To check how many messages have made it to the database so far, run the following query in your test database.

    TestTable
    | count
    
  4. To see the content of the messages, run the following query in your test database.

    TestTable
    

    The result set should look like the following image:

    Screenshot of the query results, showing the content of Event Grid messages.

Clean up resources

If you don't plan to use your event grid again, clean up the Event Grid Subscription, Event Hubs namespace, and any event hubs that were autocreated for you, to avoid incurring costs.

  1. In Azure portal, go to the left menu and select All resources.

    Screenshot of Azure portal left menu, showing the All resources option.

  2. Search for the Event Hubs namespace and select Delete to delete it:

    Screenshot of the All resources page, showing the Delete menu option.

  3. In the Delete resources form, confirm the deletion to delete the Event Hubs namespace and Event Hubs resources.

  4. Go to your storage account. In the left menu, select Events:

    Screenshot of the Azure storage account left menu, showing the Events option.

  5. Below the graph, select your Event Grid Subscription and then select Delete to delete it:

    Screenshot of the Event Subscription page, showing the selected Event Grid subscription and the Delete option.

  6. To delete your Event Grid data connection, go to your Azure Data Explorer cluster. On the left menu, select Databases.

  7. Select your database TestDatabase:

    Screenshot of Azure Data Explorer Web U I Databases page, showing a database.

  8. On the left menu, select Data ingestion:

    Screenshot of the Azure portal left menu, showing the Data ingestion option.

  9. Select your data connection test-grid-connection and then select Delete to delete it.

Next steps