Ingest blobs into Azure Data Explorer by subscribing to Event Grid notifications

Azure Data Explorer is a fast and highly scalable data exploration service for log and telemetry data. Azure Data Explorer offers ingestion (data loading) from Event Hubs, IoT Hubs, and blobs written to blob containers.

In this article, you learn how to ingest blobs from your storage account into Azure Data Explorer using an Event Grid data connection. You'll create an Event Grid data connection that set an Azure Event Grid subscription. The Event Grid subscription routes events from your storage account to Azure Data Explorer via an Azure Event Hub. Then you'll see an example of the data flow throughout the system.

Prerequisites

Create a target table in Azure Data Explorer

Create a table in Azure Data Explorer where Event Hubs will send data. Create the table in the cluster and database prepared in the prerequisites.

  1. In the Azure portal, under your cluster, select Query.

    Link to Query explorer

  2. Copy the following command into the window and select Run to create the table (TestTable) that will receive the ingested data.

    .create table TestTable (TimeStamp: datetime, Value: string, Source:string)
    

    Run command create table

  3. Copy the following command into the window and select Run to map the incoming JSON data to the column names and data types of the table (TestTable).

    .create table TestTable ingestion json mapping 'TestMapping' '[{"column":"TimeStamp","path":"$.TimeStamp"},{"column":"Value","path":"$.Value"},{"column":"Source","path":"$.Source"}]'
    

Create an Event Grid data connection in Azure Data Explorer

Now connect the storage account to Azure Data Explorer, so that data flowing into the storage is streamed to the test table.

  1. Under the cluster you created, select Databases > TestDatabase.

    Select test database

  2. Select Data ingestion > Add data connection.

    Add data connection for data ingestion

  3. Select the connection type: Blob storage.

  4. Fill out the form with the following information:

    Fill out event grid form with connection basics

    Data source:

    Setting Suggested value Field description
    Data connection name test-grid-connection The name of the connection that you want to create in Azure Data Explorer.
    Storage account subscription Your subscription ID The subscription ID where your storage account is.
    Storage account gridteststorage1 The name of the storage account that you created previously.
    Resources creation Automatic Define whether you want Azure Data Explorer to create an Event Grid Subscription, an Event Hub namespace and an Event Hub for you. A detailed explanation of how to create Event Grid subscription manually, can be found in the references under the Create an Event Grid subscription in your storage account section.
  5. Select Filter settings if you want to track specific subjects. Set the filters for the notifications as follows:

    • Prefix field is the literal prefix of the subject. As the pattern applied is startswith, it can span multiple containers, folders or blobs. No wildcards are allowed.
      • To define a filter on the blob container, the field must be set as follows: /blobServices/default/containers/[container prefix].
      • To define a filter on a blob prefix (or a folder in Azure Data Lake Gen2), the field must be set as follows: /blobServices/default/containers/[container name]/blobs/[folder/blob prefix].
    • Suffix field is the literal suffix of the blob. No wildcards are allowed.
    • Case Sensitive field indicates whether the prefix and suffix filters are case-sensitive
    • For more information about filtering events, see Blob storage events.

    Filter settings Event Grid

  6. Select Next: Ingest properties.

  7. Fill out the form with the following information and select Next: Review + Create. Table and mapping names are case-sensitive:

    Review and create table and mapping ingestion properties

    Ingest properties:

    Setting Suggested value Field description
    Table TestTable The table you created in TestDatabase.
    Data format JSON Supported formats are Avro, CSV, JSON, MULTILINE JSON, ORC, PARQUET, PSV, SCSV, SOHSV, TSV, TXT, TSVE, APACHEAVRO, RAW, and W3CLOG. Supported compression options are Zip and GZip.
    Mapping TestMapping The mapping you created in TestDatabase, which maps incoming JSON data to the column names and data types of TestTable.
  8. Review the resources that were auto created for you and select Create.

    Review and create data connection for event grid

  9. Wait until the deployment is completed. If your deployment failed, select Operation details next to the failed stage to get more information for the failure reason. Select Redeploy to try to deploy the resources again.

    Deploy event grid resources

Generate sample data

Now that Azure Data Explorer and the storage account are connected, you can create sample data and upload it to the storage container.

We'll work with a small shell script that issues a few basic Azure CLI commands to interact with Azure Storage resources. This script does the following actions:

  1. Creates a new container in your storage account.
  2. Uploads an existing file (as a blob) to that container.
  3. Lists the blobs in the container.

You can use Azure Cloud Shell to execute the script directly in the portal.

Save the data into a file and upload it with this script:

{"TimeStamp": "1987-11-16 12:00","Value": "Hello World","Source": "TestSource"}
#!/bin/bash
### A simple Azure Storage example script

    export AZURE_STORAGE_ACCOUNT=<storage_account_name>
    export AZURE_STORAGE_KEY=<storage_account_key>

    export container_name=<container_name>
    export blob_name=<blob_name>
    export file_to_upload=<file_to_upload>
    export destination_file=<destination_file>

    echo "Creating the container..."
    az storage container create --name $container_name

    echo "Uploading the file..."
    az storage blob upload --container-name $container_name --file $file_to_upload --name $blob_name --metadata "rawSizeBytes=1024"

    echo "Listing the blobs..."
    az storage blob list --container-name $container_name --output table

    echo "Done"

Note

To achieve the best ingestion performance, the uncompressed size of the compressed blobs submitted for ingestion must be communicated. Because Event Grid notifications contain only basic details, the size information must be explicitly communicated. The uncompressed size information can be provided by setting the rawSizeBytes property on the blob metadata with the uncompressed data size in bytes.

Ingestion properties

You can specify the Ingestion properties of the blob ingestion via the blob metadata.

These properties can be set:

Property Property description
rawSizeBytes Size of the raw (uncompressed) data. For Avro/ORC/Parquet, that is the size before format-specific compression is applied. Provide the original data size by setting this property to the uncompressed data size in bytes.
kustoTable Name of the existing target table. Overrides the Table set on the Data Connection blade.
kustoDataFormat Data format. Overrides the Data format set on the Data Connection blade.
kustoIngestionMappingReference Name of the existing ingestion mapping to be used. Overrides the Column mapping set on the Data Connection blade.
kustoIgnoreFirstRecord If set to true, Kusto ignores the first row of the blob. Use in tabular format data (CSV, TSV, or similar) to ignore headers.
kustoExtentTags String representing tags that will be attached to resulting extent.
kustoCreationTime Overrides $IngestionTime for the blob, formatted as an ISO 8601 string. Use for backfilling.

Note

Azure Data Explorer won't delete the blobs post ingestion. Retain the blobs for three to five days. Use Azure Blob storage lifecycle to manage blob deletion.

Review the data flow

Note

Azure Data Explorer has an aggregation (batching) policy for data ingestion designed to optimize the ingestion process. By default, the policy is configured to 5 minutes. You'll be able to alter the policy at a later time if needed. In this article you can expect a latency of a few minutes.

  1. In the Azure portal, under your event grid, you see the spike in activity while the app is running.

    Activity graph for event grid

  2. To check how many messages have made it to the database so far, run the following query in your test database.

    TestTable
    | count
    
  3. To see the content of the messages, run the following query in your test database.

    TestTable
    

    The result set should look like the following image:

    Message result set for Event Grid

Clean up resources

If you don't plan to use your event grid again, clean up the Event Grid Subscription, Event Hub namespace, and Event Hub that were auto created for you, to avoid incurring costs.

  1. In Azure portal, go to the left menu and select All resources.

    Select all resources for event grid cleanup

  2. Search for your Event Hub Namespace and select Delete to delete it:

    Clean up Event Hub namespace

  3. In the Delete resources form, confirm the deletion to delete the Event Hub Namespace and Event Hub resources.

  4. Go to your storage account. In the left menu, select Events:

    Select events to clean up for Event Grid

  5. Below the graph, Select your Event Grid Subscription and then select Delete to delete it:

    Delete event grid subscription

  6. To delete your Event Grid data connection, go to your Azure Data Explorer cluster. On the left menu, select Databases.

  7. Select your database TestDatabase:

    Select database to clean up resources

  8. On the left menu, select Data ingestion:

    Select data ingestion to clean up resources

  9. Select your data connection test-grid-connection and then select Delete to delete it.

Next steps