Stream big data into a data warehouse

Azure Event Grid is an intelligent event routing service that enables you to react to notifications from apps and services. The Event Hubs Capture and Event Grid sample shows how to use Azure Event Hubs Capture with Azure Event Grid to seamlessly migrate data from an event hub to a SQL Data Warehouse.

Application overview

As data is sent to the event hub, Capture pulls data from the stream and generates storage blobs with the data in Avro format. When Capture generates the blob, it triggers an event. Event Grid distributes data about the event to subscribers. In this case, the event data is sent to the Azure Functions endpoint. The event data includes the path of the generated blob. The function uses that URL to retrieve the file, and send it to the data warehouse.

In this article, you:

  • Deploy the following infrastructure:
    • Event hub with Capture enabled
    • Storage account for the files from Capture
    • Azure app service plan for hosting the function app
    • Function app for processing the event
    • SQL Server for hosting the data warehouse
    • SQL Data Warehouse for storing the migrated data
  • Create a table in the data warehouse
  • Add code to the function app
  • Subscribe to the event
  • Run app that sends data to the event hub
  • View migrated data in data warehouse

About the event data

Event Grid distributes event data to the subscribers. The following example shows event data for creating a Capture file. In particular, notice the fileUrl property in the data object. The function app gets this value and uses it to retrieve the Capture file.

[
    {
        "topic": "/subscriptions/<guid>/resourcegroups/rgDataMigrationSample/providers/Microsoft.EventHub/namespaces/tfdatamigratens",
        "subject": "eventhubs/hubdatamigration",
        "eventType": "Microsoft.EventHub.CaptureFileCreated",
        "eventTime": "2017-08-31T19:12:46.0498024Z",
        "id": "14e87d03-6fbf-4bb2-9a21-92bd1281f247",
        "data": {
            "fileUrl": "https://tf0831datamigrate.blob.core.windows.net/windturbinecapture/tfdatamigratens/hubdatamigration/1/2017/08/31/19/11/45.avro",
            "fileType": "AzureBlockBlob",
            "partitionId": "1",
            "sizeInBytes": 249168,
            "eventCount": 1500,
            "firstSequenceNumber": 2400,
            "lastSequenceNumber": 3899,
            "firstEnqueueTime": "2017-08-31T19:12:14.674Z",
            "lastEnqueueTime": "2017-08-31T19:12:44.309Z"
        }
    }
]

Prerequisites

To complete this tutorial, you must have:

Deploy the infrastructure

To simplify this article, you deploy the required infrastructure with a Resource Manager template. To see the resources that are deployed, view the template.

For Azure CLI, use:

az group create -l westcentralus -n rgDataMigrationSample

az group deployment create \
  --resource-group rgDataMigrationSample \
  --template-uri https://raw.githubusercontent.com/Azure/azure-docs-json-samples/master/event-grid/EventHubsDataMigration.json \
  --parameters eventHubNamespaceName=<event-hub-namespace> eventHubName=hubdatamigration sqlServerName=<sql-server-name> sqlServerUserName=<user-name> sqlServerPassword=<password> sqlServerDatabaseName=<database-name> storageName=<unique-storage-name> functionAppName=<app-name>

For PowerShell, use:

New-AzureRmResourceGroup -Name rgDataMigration -Location westcentralus

New-AzureRmResourceGroupDeployment -ResourceGroupName rgDataMigration -TemplateUri https://raw.githubusercontent.com/Azure/azure-docs-json-samples/master/event-grid/EventHubsDataMigration.json -eventHubNamespaceName <event-hub-namespace> -eventHubName hubdatamigration -sqlServerName <sql-server-name> -sqlServerUserName <user-name> -sqlServerDatabaseName <database-name> -storageName tf08202storage -functionAppName <app-name>

Provide a password value when prompted.

Create a table in SQL Data Warehouse

Add a table to your data warehouse by running the CreateDataWarehouseTable.sql script. To run the script, use Visual Studio or the Query Editor in the portal.

The script to run is:

CREATE TABLE [dbo].[Fact_WindTurbineMetrics] (
    [DeviceId] nvarchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
    [MeasureTime] datetime NULL, 
    [GeneratedPower] float NULL, 
    [WindSpeed] float NULL, 
    [TurbineSpeed] float NULL
)
WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUND_ROBIN);

Publish the Azure Functions app

  1. Open the EventHubsCaptureEventGridDemo sample project in Visual Studio 2017 (15.3.2 or greater).

  2. In Solution Explorer, right-click FunctionDWDumper, and select Publish.

    Publish function app

  3. Select Azure Function App and Select Existing. Select OK.

    Target function app

  4. Select the function app that you deployed through the template. Select OK.

    Select function app

  5. When Visual Studio has configured the profile, select Publish.

    Select publish

  6. After publishing the function, go to the Azure portal. Select your resource group and function app.

    View function app

  7. Select the function.

    Select function

  8. Get the URL for the function. You need this URL when creating the event subscription.

    Get function URL

  9. Copy the value.

    Copy URL

Subscribe to the event

You can use either Azure CLI or the portal to subscribe to the event. This article shows both approaches.

Portal

  1. From the Event Hubs namespace, select Event Grid on the left.

    Select Event Grid

  2. Add an event subscription.

    Add event subscription

  3. Provide values for the event subscription. Use the Azure Functions URL that you copied. Select Create.

    Provide subscription values

Azure CLI

To subscribe to the event, run the following command:

az eventgrid resource event-subscription create -g rgDataMigrationSample --provider-namespace Microsoft.EventHub --resource-type namespaces --resource-name <your-EventHubs-namespace> --name captureEventSub --endpoint <your-function-endpoint>

Run the app to generate data

You have finished setting up your event hub, SQL data warehouse, Azure function app, and event subscription. The solution is ready to migrate data from the event hub to the data warehouse. Before running an application that generates data for event hub, you need to configure a few values.

  1. In the portal, select your event hub namespace. Select Connection Strings.

    Select connection strings

  2. Select RootManageSharedAccessKey

    Select key

  3. Copy Connection string - Primary Key

    Copy key

  4. Go back to your Visual Studio project. In the WindTurbineDataGenerator project, open program.cs.

  5. Replace the two constant values. Use the copied value for EventHubConnectionString. Use the event hub name for EventHubName.

    private const string EventHubConnectionString = "Endpoint=sb://tfdatamigratens.servicebus.windows.net/...";
    private const string EventHubName = "hubdatamigration";
    
  6. Build the solution. Run the WindTurbineGenerator.exe application. After a couple of minutes, query the table in your data warehouse for the migrated data.

Next steps