Stream big data into a data warehouse

Azure Event Grid is an intelligent event routing service that enables you to react to notifications from apps and services. For example, it can trigger an Azure Function to process Event Hubs data that has been captured to an Azure Blob storage or Data Lake Store, and migrate the data to other data repositories. This Event Hubs Capture and Event Grid sample shows how to use Event Hubs Capture with Event Grid to seamlessly migrate Event Hubs data from blob storage to a SQL Data Warehouse.

Application overview

As data is sent to the event hub, Capture pulls data from the stream and generates storage blobs with the data in Avro format. When Capture generates the blob, it triggers an event. Event Grid distributes data about the event to subscribers. In this case, the event data is sent to the Azure Functions endpoint. The event data includes the path of the generated blob. The function uses that URL to retrieve the file, and send it to the data warehouse.

In this article, you:

  • Deploy the following infrastructure:
    • Event hub with Capture enabled
    • Storage account for the files from Capture
    • Azure app service plan for hosting the function app
    • Function app for processing the event
    • SQL Server for hosting the data warehouse
    • SQL Data Warehouse for storing the migrated data
  • Create a table in the data warehouse
  • Add code to the function app
  • Subscribe to the event
  • Run app that sends data to the event hub
  • View migrated data in data warehouse

About the event data

Event Grid distributes event data to the subscribers. The following example shows event data for creating a Capture file. In particular, notice the fileUrl property in the data object. The function app gets this value and uses it to retrieve the Capture file.

[
    {
        "topic": "/subscriptions/<guid>/resourcegroups/rgDataMigrationSample/providers/Microsoft.EventHub/namespaces/tfdatamigratens",
        "subject": "eventhubs/hubdatamigration",
        "eventType": "Microsoft.EventHub.CaptureFileCreated",
        "eventTime": "2017-08-31T19:12:46.0498024Z",
        "id": "14e87d03-6fbf-4bb2-9a21-92bd1281f247",
        "data": {
            "fileUrl": "https://tf0831datamigrate.blob.core.windows.net/windturbinecapture/tfdatamigratens/hubdatamigration/1/2017/08/31/19/11/45.avro",
            "fileType": "AzureBlockBlob",
            "partitionId": "1",
            "sizeInBytes": 249168,
            "eventCount": 1500,
            "firstSequenceNumber": 2400,
            "lastSequenceNumber": 3899,
            "firstEnqueueTime": "2017-08-31T19:12:14.674Z",
            "lastEnqueueTime": "2017-08-31T19:12:44.309Z"
        }
    }
]

Prerequisites

To complete this tutorial, you must have:

Deploy the infrastructure

To simplify this article, you deploy the required infrastructure with a Resource Manager template. To see the resources that are deployed, view the template.

For Azure CLI, use:

az group create -l westcentralus -n rgDataMigrationSample

az group deployment create \
  --resource-group rgDataMigrationSample \
  --template-uri https://raw.githubusercontent.com/Azure/azure-docs-json-samples/master/event-grid/EventHubsDataMigration.json \
  --parameters eventHubNamespaceName=<event-hub-namespace> eventHubName=hubdatamigration sqlServerName=<sql-server-name> sqlServerUserName=<user-name> sqlServerPassword=<password> sqlServerDatabaseName=<database-name> storageName=<unique-storage-name> functionAppName=<app-name>

For PowerShell, use:

New-AzureRmResourceGroup -Name rgDataMigration -Location westcentralus

New-AzureRmResourceGroupDeployment -ResourceGroupName rgDataMigration -TemplateUri https://raw.githubusercontent.com/Azure/azure-docs-json-samples/master/event-grid/EventHubsDataMigration.json -eventHubNamespaceName <event-hub-namespace> -eventHubName hubdatamigration -sqlServerName <sql-server-name> -sqlServerUserName <user-name> -sqlServerDatabaseName <database-name> -storageName <unique-storage-name> -functionAppName <app-name>

Provide a password value when prompted.

Create a table in SQL Data Warehouse

Add a table to your data warehouse by running the CreateDataWarehouseTable.sql script. To run the script, use Visual Studio or the Query Editor in the portal.

The script to run is:

CREATE TABLE [dbo].[Fact_WindTurbineMetrics] (
    [DeviceId] nvarchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
    [MeasureTime] datetime NULL, 
    [GeneratedPower] float NULL, 
    [WindSpeed] float NULL, 
    [TurbineSpeed] float NULL
)
WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUND_ROBIN);

Publish the Azure Functions app

  1. Open the EventHubsCaptureEventGridDemo sample project in Visual Studio 2017 (15.3.2 or greater).

  2. In Solution Explorer, right-click FunctionEGDWDumper, and select Publish.

    Publish function app

  3. Select Azure Function App and Select Existing. Select Publish.

    Target function app

  4. Select the function app that you deployed through the template. Select OK.

    Select function app

  5. When Visual Studio has configured the profile, select Publish.

    Select publish

After publishing the function, you're ready to subscribe to the event.

Subscribe to the event

  1. Go to the Azure portal. Select your resource group and function app.

    View function app

  2. Select the function.

    Select function

  3. Select Add Event Grid subscription.

    Add subscription

  4. Give the event grid subscription a name. Use Event Hubs Namespaces as the event type. Provide values to select your instance of the Event Hubs namespace. Leave the subscriber endpoint as the provided value. Select Create.

    Create subscription

Run the app to generate data

You've finished setting up your event hub, SQL data warehouse, Azure function app, and event subscription. The solution is ready to migrate data from the event hub to the data warehouse. Before running an application that generates data for event hub, you need to configure a few values.

  1. In the portal, select your event hub namespace. Select Connection Strings.

    Select connection strings

  2. Select RootManageSharedAccessKey

    Select key

  3. Copy Connection string - primary Key

    Copy key

  4. Go back to your Visual Studio project. In the WindTurbineDataGenerator project, open program.cs.

  5. Replace the two constant values. Use the copied value for EventHubConnectionString. Use hubdatamigration the event hub name.

    private const string EventHubConnectionString = "Endpoint=sb://demomigrationnamespace.servicebus.windows.net/...";
    private const string EventHubName = "hubdatamigration";
    
  6. Build the solution. Run the WindTurbineGenerator.exe application. After a couple of minutes, query the table in your data warehouse for the migrated data.

Next steps