您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

教程:将大数据流式传输到数据仓库Tutorial: Stream big data into a data warehouse

Azure 事件网格是一项智能事件路由服务,可用于对应用和服务的通知(事件)作出响应。Azure Event Grid is an intelligent event routing service that enables you to react to notifications (events) from apps and services. 例如,它可以触发 Azure 函数来处理已捕获到 Azure Blob 存储或 Azure Data Lake Storage 的事件中心数据,并将数据迁移到其他数据存储库。For example, it can trigger an Azure Function to process Event Hubs data that has been captured to an Azure Blob storage or Azure Data Lake Storage, and migrate the data to other data repositories. 事件中心和事件网格集成示例展示了如何将事件中心与事件网格结合使用,从而将捕获的事件中心数据从 blob 存储无缝迁移到 SQL 数据仓库。This Event Hubs and Event Grid integration sample shows you how to use Event Hubs with Event Grid to seamlessly migrate captured Event Hubs data from blob storage to a SQL Data Warehouse.

应用概览

此图描绘了在本教程中生成的解决方案的工作流:This diagram depicts the workflow of the solution you build in this tutorial:

  1. 在 Azure Blob 存储中捕获发送到 Azure 事件中心的数据。Data sent to an Azure event hub is captured in an Azure blob storage.
  2. 完成数据捕获后,将生成一个事件并将其发送到 Azure 事件网格。When the data capture is complete, an event is generated and sent to an Azure event grid.
  3. 事件网格将此事件数据转发到 Azure 函数应用。The event grid forwards this event data to an Azure function app.
  4. 函数应用使用事件数据中的 Blob URL 从存储中检索 Blob。The function app uses the blob URL in the event data to retrieve the blob from the storage.
  5. 函数应用将 Blob 数据迁移到 Azure SQL 数据仓库。The function app migrates the blob data to an Azure SQL data warehouse.

在本文中,将执行以下步骤:In this article, you take the following steps:

  • 使用 Azure 资源管理器模板部署基础结构:事件中心、存储帐户、函数应用、SQL 数据仓库。Use an Azure Resource Manager template to deploy the infrastructure: an event hub, a storage account, a function app, a SQL data warehouse.
  • 在数据仓库中创建表。Create a table in the data warehouse.
  • 将代码添加到函数应用。Add code to the function app.
  • 订阅事件。Subscribe to the event.
  • 运行将数据发送到事件中心的应用。Run app that sends data to the event hub.
  • 查看数据仓库中的已迁移数据。View migrated data in data warehouse.

先决条件Prerequisites

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

若要完成本教程,必须满足以下先决条件:To complete this tutorial, you must have:

部署基础结构Deploy the infrastructure

在此步骤中,使用资源管理器模板部署所需的基础结构。In this step, you deploy the required infrastructure with a Resource Manager template. 部署模板时,将创建以下资源:When you deploy the template, the following resources are created:

  • 已启用捕获功能的事件中心。Event hub with the Capture feature enabled.
  • 适用于已捕获文件的存储帐户。Storage account for the captured files.
  • 用于托管函数应用的应用服务计划App service plan for hosting the function app
  • 用于处理事件的函数应用Function app for processing the event
  • 用于托管数据仓库的 SQL ServerSQL Server for hosting the data warehouse
  • 用于存储已迁移数据的 SQL 数据仓库SQL Data Warehouse for storing the migrated data

在 Azure 门户中启动 Azure Cloud ShellLaunch Azure Cloud Shell in Azure portal

  1. 登录到 Azure 门户Sign in to the Azure portal.

  2. 选择顶部的“Cloud Shell”按钮 。Select Cloud Shell button at the top.

    Azure 门户

  3. 会看到 Cloud Shell 在浏览器底部打开。You see the Cloud Shell opened at the bottom of the browser.

    Cloud Shell

  4. 在 Cloud Shell 中,如果看到在“Bash”和“PowerShell”之间进行选择的选项,请选择“Bash” 。In the Cloud Shell, if you see an option to select between Bash and PowerShell, select Bash.

  5. 如果是第一次使用 Cloud Shell,请选择“创建存储”来创建存储帐户 。If you are using the Cloud Shell for the first time, create a storage account by selecting Create storage. Azure Cloud Shell 需要一个 Azure 存储帐户来存储某些文件。Azure Cloud Shell requires an Azure storage account to store some files.

    为 Cloud Shell 创建存储

  6. 等待 Cloud Shell 初始化。Wait until the Cloud Shell is initialized.

    为 Cloud Shell 创建存储

使用 Azure CLIUse Azure CLI

  1. 通过运行以下 CLI 命令创建 Azure 资源组:Create an Azure resource group by running the following CLI command:
    1. 将以下命令复制并粘贴到 Cloud Shell 窗口中Copy and paste the following command into the Cloud Shell window

      az group create -l eastus -n <Name for the resource group>
      
    2. 为资源组指定名称 Specify a name for the resource group

    3. ENTERPress ENTER.

      下面是一个示例:Here is an example:

      user@Azure:~$ az group create -l eastus -n ehubegridgrp
      {
        "id": "/subscriptions/00000000-0000-0000-0000-0000000000000/resourceGroups/ehubegridgrp",
        "location": "eastus",
        "managedBy": null,
        "name": "ehubegridgrp",
        "properties": {
          "provisioningState": "Succeeded"
        },
        "tags": null
      }
      
  2. 通过运行以下 CLI 命令来部署上一部分(事件中心、存储帐户、函数应用、SQL 数据仓库)中提到的所有资源:Deploy all the resources mentioned in the previous section (event hub, storage account, functions app, SQL data warehouse) by running the following CLI command:
    1. 将命令复制并粘贴到 Cloud Shell 窗口中。Copy and paste the command into the Cloud Shell window. 或者,可能需要复制/粘贴到所选的编辑器中,设置值,然后将该命令复制到 Cloud Shell。Alternatively, you may want to copy/paste into an editor of your choice, set values, and then copy the command to the Cloud Shell.

      az group deployment create \
          --resource-group rgDataMigrationSample \
          --template-uri https://raw.githubusercontent.com/Azure/azure-docs-json-samples/master/event-grid/EventHubsDataMigration.json \
          --parameters eventHubNamespaceName=<event-hub-namespace> eventHubName=hubdatamigration sqlServerName=<sql-server-name> sqlServerUserName=<user-name> sqlServerPassword=<password> sqlServerDatabaseName=<database-name> storageName=<unique-storage-name> functionAppName=<app-name>
      
    2. 指定以下实体的值:Specify values for the following entities:

      1. 之前创建的资源组的名称。Name of the resource group you created earlier.
      2. 事件中心命名空间的名称。Name for the event hub namespace.
      3. 事件中心的名称。Name for the event hub. 可以将值保留原样 (hubdatamigration)。You can leave the value as it is (hubdatamigration).
      4. SQL Server 的名称。Name for the SQL server.
      5. SQL 用户名称和密码。Name of the SQL user and password.
      6. SQL 数据仓库的名称Name for the SQL data warehouse
      7. 存储帐户的名称。Name of the storage account.
      8. 函数应用的名称。Name for the function app.
    3. 在 Cloud Shell 窗口中按 ENTER 以运行该命令 。Press ENTER in the Cloud Shell window to run the command. 此过程可能需要一段时间,因为正在创建一系列资源。This process may take a while since you are creating a bunch of resources. 在命令的结果中,请确保没有任何故障。In the result of the command, ensure that there have been no failures.

使用 Azure PowerShellUse Azure PowerShell

  1. 在 Azure Cloud Shell 中,切换到 PowerShell 模式。In Azure Cloud Shell, switch to PowerShell mode. 选择 Azure Cloud Shell 左上角的向下键,然后选择“PowerShell” 。Select down arrow in the top-left corner of Azure Cloud Shell, and select PowerShell.

    切换到 PowerShell

  2. 通过运行以下命令创建 Azure 资源组:Create an Azure resource group by running the following command:

    1. 将以下命令复制并粘贴到 Cloud Shell 窗口中。Copy and paste the following command into the Cloud Shell window.

      New-AzResourceGroup -Name rgDataMigration -Location westcentralus
      
    2. 为资源组指定名称 。Specify a name for the resource group.

    3. 按 Enter。Press ENTER.

  3. 通过运行以下命令来部署上一部分(事件中心、存储帐户、函数应用、SQL 数据仓库)中提到的所有资源:Deploy all the resources mentioned in the previous section (event hub, storage account, functions app, SQL data warehouse) by running the following command:

    1. 将命令复制并粘贴到 Cloud Shell 窗口中。Copy and paste the command into the Cloud Shell window. 或者,可能需要复制/粘贴到所选的编辑器中,设置值,然后将该命令复制到 Cloud Shell。Alternatively, you may want to copy/paste into an editor of your choice, set values, and then copy the command to the Cloud Shell.

      New-AzResourceGroupDeployment -ResourceGroupName rgDataMigration -TemplateUri https://raw.githubusercontent.com/Azure/azure-docs-json-samples/master/event-grid/EventHubsDataMigration.json -eventHubNamespaceName <event-hub-namespace> -eventHubName hubdatamigration -sqlServerName <sql-server-name> -sqlServerUserName <user-name> -sqlServerDatabaseName <database-name> -storageName <unique-storage-name> -functionAppName <app-name>
      
    2. 指定以下实体的值:Specify values for the following entities:

      1. 之前创建的资源组的名称。Name of the resource group you created earlier.
      2. 事件中心命名空间的名称。Name for the event hub namespace.
      3. 事件中心的名称。Name for the event hub. 可以将值保留原样 (hubdatamigration)。You can leave the value as it is (hubdatamigration).
      4. SQL Server 的名称。Name for the SQL server.
      5. SQL 用户名称和密码。Name of the SQL user and password.
      6. SQL 数据仓库的名称Name for the SQL data warehouse
      7. 存储帐户的名称。Name of the storage account.
      8. 函数应用的名称。Name for the function app.
    3. 在 Cloud Shell 窗口中按 ENTER 以运行该命令 。Press ENTER in the Cloud Shell window to run the command. 此过程可能需要一段时间,因为正在创建一系列资源。This process may take a while since you are creating a bunch of resources. 在命令的结果中,请确保没有任何故障。In the result of the command, ensure that there have been no failures.

关闭 Cloud ShellClose the Cloud Shell

通过选择门户中的“Cloud Shell”按钮(或)Cloud Shell 窗口右上角的“X”按钮来关闭 Cloud Shell 。Close the cloud shell by selecting the Cloud Shell button in the portal (or) X button in the top-right corner of the Cloud Shell window.

验证是否已创建资源Verify that the resources are created

  1. 在 Azure 门户中的左侧菜单上选择“资源组” 。In the Azure portal, select Resource groups on the left menu.

  2. 通过在搜索框中输入资源组的名称来筛选资源组列表。Filter the list of resource groups by entering the name of your resource group in the search box.

  3. 在列表中选择你的资源组。Select your resource group in the list.

    选择你的资源组

  4. 确认是否在资源组中看到以下资源:Confirm that you see the following resources in the resource group:

    资源组中的资源

在 SQL 数据仓库中创建表Create a table in SQL Data Warehouse

通过运行 CreateDataWarehouseTable.sql 脚本在数据仓库中创建表。Create a table in your data warehouse by running the CreateDataWarehouseTable.sql script. 若要运行此脚本,可以使用 Visual Studio 或门户中的查询编辑器。To run the script, you can use Visual Studio or the Query Editor in the portal. 以下步骤显示如何使用查询编辑器:The following steps show you how to use the Query Editor:

  1. 在资源组的资源列表中,选择 SQL 数据仓库。In the list of resources in the resource group, select your SQL data warehouse.

  2. 在 SQL 数据仓库页中,选择左侧菜单中的“查询编辑器 (预览)” 。In the SQL data warehouse page, select Query editor (preview) in the left menu.

    SQL 数据仓库页

  3. 输入 SQL Server 的“用户名”和“密码”,然后选择“确定” 。Enter the name of user and password for the SQL server, and select OK.

    SQL Server 身份验证

  4. 在查询窗口中,复制并运行以下 SQL 脚本:In the query window, copy and run the following SQL script:

    CREATE TABLE [dbo].[Fact_WindTurbineMetrics] (
        [DeviceId] nvarchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
        [MeasureTime] datetime NULL, 
        [GeneratedPower] float NULL, 
        [WindSpeed] float NULL, 
        [TurbineSpeed] float NULL
    )
    WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUND_ROBIN);
    

    运行 SQL 查询

  5. 保持此选项卡或窗口处于打开状态,以便可以验证在本教程结束时是否创建了数据。Keep this tab or window open so that you can verify that the data is created at the end of the tutorial.

发布 Azure Functions 应用Publish the Azure Functions app

  1. 启动 Visual Studio。Launch Visual Studio.

  2. 打开作为先决条件的一部分从 GitHub 下载的 EventHubsCaptureEventGridDemo.sln 解决方案 。Open the EventHubsCaptureEventGridDemo.sln solution that you downloaded from the GitHub as part of the prerequisites.

  3. 在“解决方案资源管理器”中,右键单击“FunctionEGDWDumper” ,再选择“发布” 。In Solution Explorer, right-click FunctionEGDWDumper, and select Publish.

    发布函数应用

  4. 如果看到以下屏幕,请选择“启动” 。If you see the following screen, select Start.

    “开始发布”按钮

  5. 在“选择发布目标”页中,选择“选择现有”选项,然后选择“创建配置文件” 。In the Pick a publish target page, select the Select existing option, and select Create Profile.

    选取发布目标

  6. 在“应用服务”页中,选择你的 Azure 订阅,在资源组中选择“函数应用”,然后选择“确定” 。In the App Service page, select your Azure subscription, select the function app in your resource group, and select OK.

    应用服务页

  7. 在 Visual Studio 配置好配置文件后,选择“发布” 。When Visual Studio has configured the profile, select Publish.

    选择发布

在发布函数后,已准备好订阅事件。After publishing the function, you're ready to subscribe to the event.

订阅事件Subscribe to the event

  1. 在 Web 浏览器的新选项卡或新窗口中,导航到 Azure 门户In a new tab or new window of a web browser, navigate to the Azure portal.

  2. 在 Azure 门户中的左侧菜单上选择“资源组” 。In the Azure portal, select Resource groups on the left menu.

  3. 通过在搜索框中输入资源组的名称来筛选资源组列表。Filter the list of resource groups by entering the name of your resource group in the search box.

  4. 在列表中选择你的资源组。Select your resource group in the list.

    选择你的资源组

  5. 在列表中选择应用服务计划。Select the App Service plan in the list.

  6. 在“应用服务计划”页中,选择左侧菜单中的“应用”,然后选择函数应用 。In the App Service Plan page, select Apps in the left menu, and select the function app.

    选择函数应用

  7. 展开函数应用,展开函数,然后选择函数。Expand the function app, expand functions, and then select your function.

    选择 Azure 函数

  8. 在工具栏上选择“添加事件网格订阅” 。Select Add Event Grid subscription on the toolbar.

  9. 在“创建事件网格订阅”页中,请执行以下操作 :In the Create Event Grid Subscription page, do the following actions:

    1. 在“主题详细信息”部分中,请执行以下操作 :In the TOPIC DETAILS section, do the following actions:

      1. 选择 Azure 订阅。Select your Azure subscription.
      2. 选择 Azure 资源组。Select the Azure resource group.
      3. 选择事件中心命名空间。Select the Event Hubs namespace.
    2. 在“事件订阅详细信息”页中,输入订阅的名称(例如:captureEventSub),然后选择“创建” 。In the EVENT SUBSCRIPTION DETAILS page, enter a name for the subscription (for example: captureEventSub), and select Create.

      创建事件网格订阅

运行应用以生成数据Run the app to generate data

至此,已完成设置事件中心、SQL 数据仓库、Azure 函数应用和事件订阅。You've finished setting up your event hub, SQL data warehouse, Azure function app, and event subscription. 需要先配置几个值,然后再运行应用来生成事件中心数据。Before running an application that generates data for event hub, you need to configure a few values.

  1. 在 Azure 门户中,像之前那样导航到资源组。In the Azure portal, navigate to your resource group as you did earlier.

  2. 选择事件中心命名空间。Select the Event Hubs namespace.

  3. 在“事件中心命名空间”页中的左侧菜单上选择“共享访问策略” 。In the Event Hubs Namespace page, select Shared access policies on the left menu.

  4. 在策略列表中选择 RootManageSharedAccessKey 。Select RootManageSharedAccessKey in the list of policies.

  5. 选择“连接字符串 - 主密钥”文本框旁边的“复制”按钮 。Select the copy button next to the Connection string-primary key text box.

    事件中心命名空间的连接字符串

  6. 返回到 Visual Studio 解决方案。Go back to your Visual Studio solution.

  7. 在 WindTurbineDataGenerator 项目中,打开 program.cs 。In the WindTurbineDataGenerator project, open program.cs.

  8. 替换两个常数值。Replace the two constant values. 使用复制的 EventHubConnectionString 值。Use the copied value for EventHubConnectionString. 使用 hubdatamigration 作为事件中心名称。Use hubdatamigration the event hub name. 如果为事件中心使用了其他名称,请指定该名称。If you used a different name for the event hub, specify that name.

    private const string EventHubConnectionString = "Endpoint=sb://demomigrationnamespace.servicebus.windows.net/...";
    private const string EventHubName = "hubdatamigration";
    
  9. 生成解决方案。Build the solution. 运行 WindTurbineGenerator.exe 应用程序 。Run the WindTurbineGenerator.exe application.

  10. 几分钟后,查询数据仓库中的表,获取已迁移数据。After a couple of minutes, query the table in your data warehouse for the migrated data.

    查询结果

事件中心生成的事件数据Event data generated by the event hub

事件网格将事件数据分发给订阅者。Event Grid distributes event data to the subscribers. 以下示例显示了在 Blob 中捕获通过事件中心的数据流时生成的事件数据。The following example shows event data generated when data streaming through an event hub is captured in a blob. 特别要注意 data 对象中的 fileUrl 属性指向存储中的 Blob。In particular, notice the fileUrl property in the data object points to the blob in the storage. 函数应用使用此 URL 来检索具有捕获数据的 Blob 文件。The function app uses this URL to retrieve the blob file with captured data.

[
    {
        "topic": "/subscriptions/<guid>/resourcegroups/rgDataMigrationSample/providers/Microsoft.EventHub/namespaces/tfdatamigratens",
        "subject": "eventhubs/hubdatamigration",
        "eventType": "Microsoft.EventHub.CaptureFileCreated",
        "eventTime": "2017-08-31T19:12:46.0498024Z",
        "id": "14e87d03-6fbf-4bb2-9a21-92bd1281f247",
        "data": {
            "fileUrl": "https://tf0831datamigrate.blob.core.windows.net/windturbinecapture/tfdatamigratens/hubdatamigration/1/2017/08/31/19/11/45.avro",
            "fileType": "AzureBlockBlob",
            "partitionId": "1",
            "sizeInBytes": 249168,
            "eventCount": 1500,
            "firstSequenceNumber": 2400,
            "lastSequenceNumber": 3899,
            "firstEnqueueTime": "2017-08-31T19:12:14.674Z",
            "lastEnqueueTime": "2017-08-31T19:12:44.309Z"
        }
    }
]

后续步骤Next steps