Ingest blobs into Azure Synapse Data Explorer by subscribing to Event Grid notifications (Preview)

Azure Synapse Data Explorer is a fast and highly scalable data exploration service for log and telemetry data. Azure Synapse Data Explorer offers ingestion (data loading) from Event Hubs, IoT Hubs, and blobs written to blob containers.

In this article, you learn how to ingest blobs from your storage account into Azure Synapse Data Explorer using an Event Grid data connection. You'll create an Event Grid data connection that sets an Azure Event Grid subscription. The Event Grid subscription routes events from your storage account to Data Explorer via an Azure Event Hub. Then you'll see an example of the data flow throughout the system.

For general information about ingesting into Data Explorer from Event Grid, see Connect to Event Grid.

Prerequisites

  • An Azure subscription. Create a free Azure account.

  • Create a Data Explorer pool using Synapse Studio or the Azure portal

  • Create a Data Explorer database.

    1. In Synapse Studio, on the left-side pane, select Data.

    2. Select + (Add new resource) > Data Explorer pool, and use the following information:

      Setting Suggested value Description
      Pool name contosodataexplorer The name of the Data Explorer pool to use
      Name TestDatabase The database name must be unique within the cluster.
      Default retention period 365 The time span (in days) for which it's guaranteed that the data is kept available to query. The time span is measured from the time that data is ingested.
      Default cache period 31 The time span (in days) for which to keep frequently queried data available in SSD storage or RAM, rather than in longer-term storage.
    3. Select Create to create the database. Creation typically takes less than a minute.

  • Create a target table to which Event Hubs will send data
    1. In Synapse Studio, on the left-side pane, select Develop.

    2. Under KQL scripts, Select + (Add new resource) > KQL script. On the right-side pane, you can name your script.

    3. In the Connect to menu, select contosodataexplorer.

    4. In the Use database menu, select TestDatabase.

    5. Paste in the following command, and select Run to create the table.

      .create table TestTable (TimeStamp: datetime, Value: string, Source:string)
      

      Tip

      Verify that the table was successfully created. On the left-side pane, select Data, select the contosodataexplorer more menu, and then select Refresh. Under contosodataexplorer, expand Tables and make sure that the TestTable table appears in the list.

    6. Copy the following command into the window and select Run to map the incoming JSON data to the column names and data types of the table (TestTable).

      .create table TestTable ingestion json mapping 'TestMapping' '[{"column":"TimeStamp","path":"$.TimeStamp"},{"column":"Value","path":"$.Value"},{"column":"Source","path":"$.Source"}]'
      

Sign in to the Azure portal

Sign in to the Azure portal.

Create an Event Grid data connection

Now connect the storage account to Data Explorer, so that data flowing into the storage is streamed to the test table. This connection can be created in the Azure portal under Data Explorer.

  1. Under the Data Explorer pool you created, select Databases > TestDatabase.

    Select test database.

  2. Select Data connections and Add data connection.

    Select data ingestion and Add data connection.

Data connection - Basics tab

  1. Select the connection type: Blob storage.

  2. Fill out the form with the following information:

    Fill out Event Grid form with connection basics.

    Setting Suggested value Field description
    Data connection name test-grid-connection The name of the connection that you want to create in Data Explorer.
    Storage account subscription Your subscription ID The subscription ID where your storage account is.
    Storage account gridteststorage1 The name of the storage account that you created previously.
    Event type Blob created or Blob renamed The type of event that triggers ingestion. Blob renamed is supported only for ADLSv2 storage. Supported types are: Microsoft.Storage.BlobCreated or Microsoft.Storage.BlobRenamed.
    Resources creation Automatic Define whether you want Data Explorer to create an Event Grid Subscription, an Event Hub namespace, and an Event Hub for you.
  3. Select Filter settings if you want to track specific subjects. Set the filters for the notifications as follows:

    • Prefix field is the literal prefix of the subject. As the pattern applied is startswith, it can span multiple containers, folders, or blobs. No wildcards are allowed.
      • To define a filter on the blob container, the field must be set as follows: /blobServices/default/containers/[container prefix].
      • To define a filter on a blob prefix (or a folder in Azure Data Lake Gen2), the field must be set as follows: /blobServices/default/containers/[container name]/blobs/[folder/blob prefix].
    • Suffix field is the literal suffix of the blob. No wildcards are allowed.
    • Case-Sensitive field indicates whether the prefix and suffix filters are case-sensitive
    • For more information about filtering events, see Blob storage events.

    Filter settings Event Grid.

  4. Select Next: Ingest properties.

Note

We recommend updating the data connection to use managed identities to access the storage account as soon as soon as the option becomes available for your cluster.

Data connection - Ingest properties tab

  1. Fill out the form with the following information. Table and mapping names are case-sensitive:

    Review and create table and mapping ingestion properties.

    Ingest properties:

    Setting Suggested value Field description
    Table name TestTable The table you created in TestDatabase.
    Data format JSON Supported formats are Avro, CSV, JSON, MULTILINE JSON, ORC, PARQUET, PSV, SCSV, SOHSV, TSV, TXT, TSVE, APACHEAVRO, RAW, and W3CLOG. Supported compression options are Zip and Gzip.
    Mapping TestMapping The mapping you created in TestDatabase, which maps incoming JSON data to the column names and data types of TestTable.
    Advanced settings My data has headers Ignores headers. Supported for *SV type files.

    Note

    You don't have to specify all Default routing settings. Partial settings are also accepted.

  2. Select Next: Review + Create

Data connection - Review + Create tab

  1. Review the resources that were auto created for you and select Create.

    Review and create data connection for Event Grid.

Deployment

Wait until the deployment is completed. If your deployment failed, select Operation details next to the failed stage to get more information for the failure reason. Select Redeploy to try to deploy the resources again. You can alter the parameters before deployment.

Deploy Event Grid resources.

Generate sample data

Now that Data Explorer and the storage account are connected, you can create sample data.

Upload blob to the storage container

We'll work with a small shell script that issues a few basic Azure CLI commands to interact with Azure Storage resources. This script does the following actions:

  1. Creates a new container in your storage account.
  2. Uploads an existing file (as a blob) to that container.
  3. Lists the blobs in the container.

You can use Azure Cloud Shell to execute the script directly in the portal.

Save the data into a file and upload it with this script:

{"TimeStamp": "1987-11-16 12:00","Value": "Hello World","Source": "TestSource"}
    #!/bin/bash
    ### A simple Azure Storage example script

    export AZURE_STORAGE_ACCOUNT=<storage_account_name>
    export AZURE_STORAGE_KEY=<storage_account_key>

    export container_name=<container_name>
    export blob_name=<blob_name>
    export file_to_upload=<file_to_upload>
    export destination_file=<destination_file>

    echo "Creating the container..."
    az storage container create --name $container_name

    echo "Uploading the file..."
    az storage blob upload --container-name $container_name --file $file_to_upload --name $blob_name --metadata "rawSizeBytes=1024"

    echo "Listing the blobs..."
    az storage blob list --container-name $container_name --output table

    echo "Done"

Note

To achieve the best ingestion performance, the uncompressed size of the compressed blobs submitted for ingestion must be communicated. Because Event Grid notifications contain only basic details, the size information must be explicitly communicated. The uncompressed size information can be provided by setting the rawSizeBytes property on the blob metadata with the uncompressed data size in bytes.

Rename blob

If you're ingesting data from ADLSv2 storage and have defined Blob renamed as the event type for the data connection, the trigger for blob ingestion is blob renaming. To rename a blob, navigate to the blob in Azure portal, right-click on the blob and select Rename:

Rename blob in Azure portal.

Ingestion properties

You can specify the ingestion properties of the blob ingestion via the blob metadata.

Note

Data Explorer won't delete the blobs post ingestion. Retain the blobs for three to five days. Use Azure Blob storage lifecycle to manage blob deletion.

Review the data flow

Note

Data Explorer has an aggregation (batching) policy for data ingestion designed to optimize the ingestion process. By default, the policy is configured to 5 minutes. You'll be able to alter the policy at a later time if needed. In this article you can expect a latency of a few minutes.

  1. In the Azure portal, under your Event Grid, you see the spike in activity while the app is running.

    Activity graph for Event Grid.

  2. To check how many messages have made it to the database so far, run the following query in your test database.

    TestTable
    | count
    
  3. To see the content of the messages, run the following query in your test database.

    TestTable
    

    The result set should look like the following image:

    Message result set for Event Grid.

Clean up resources

If you don't plan to use your Event Grid again, clean up the Event Grid Subscription, Event Hub namespace, and Event Hub that were autocreated for you, to avoid incurring costs.

  1. In Azure portal, go to the left menu and select All resources.

    Select all resources for Event Grid cleanup.

  2. Search for your Event Hub Namespace and select Delete to delete it:

    Clean up Event Hub namespace.

  3. In the Delete resources form, confirm the deletion to delete the Event Hub Namespace and Event Hub resources.

  4. Go to your storage account. In the left menu, select Events:

    Select events to clean up for Event Grid.

  5. Below the graph, Select your Event Grid Subscription and then select Delete to delete it:

    Delete Event Grid subscription.

  6. To delete your Event Grid data connection, go to your Data Explorer cluster. On the left menu, select Databases.

  7. Select your database TestDatabase:

    Select database to clean up resources.

  8. On the left menu, select Data ingestion:

    Select data ingestion to clean up resources.

  9. Select your data connection test-grid-connection and then select Delete to delete it.

Next steps