使用 Azure Monitor 发送有关运行状况服务故障的电子邮件Use Azure Monitor to send emails for Health Service Faults

适用于:Windows Server 2019、Windows Server 2016Applies to: Windows Server 2019, Windows Server 2016

Azure Monitor 提供用于收集、分析和处理来自云与本地环境的遥测数据的综合解决方案,可将应用程序的可用性和性能最大化。Azure Monitor maximizes the availability and performance of your applications by delivering a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments. 它可以帮助你了解应用程序的性能,并主动识别影响应用程序及其所依赖资源的问题。It helps you understand how your applications are performing and proactively identifies issues affecting them and the resources they depend on.

这对于本地超聚合群集特别有用。This is particularly helpful for your on-premises hyper-converged cluster. 通过 Azure Monitor 集成,你可以配置电子邮件、文本 (SMS) 和其他警报,以便在 (群集出现问题时 ping 你,或者根据所) 收集的数据来标记其他活动。With Azure Monitor integrated, you will be able to configure email, text (SMS), and other alerts to ping you when something is wrong with your cluster (or when you want to flag some other activity based on the data collected). 下面,我们将简要说明 Azure Monitor 的工作原理、如何安装 Azure Monitor,以及如何将其配置为发送通知。Below, we will briefly explain how Azure Monitor works, how to install Azure Monitor, and how to configure it to send you notifications.

如果使用的是 System Center,请查看监视 Windows Server 2019 和 Windows Server 2016 存储空间直通群集的存储空间直通管理包If you are using System Center, check out the Storage Spaces Direct management pack that monitors both Windows Server 2019 and Windows Server 2016 Storage Spaces Direct clusters.

此管理包包括:This management pack includes:

  • 物理磁盘运行状况和性能监视Physical disk health and performance monitoring
  • 存储节点运行状况和性能监视Storage Node health and performance monitoring
  • 存储池运行状况和性能监视Storage Pool health and performance monitoring
  • 卷复原类型和重复数据删除状态Volume resiliency type and Deduplication status

了解 Azure MonitorUnderstanding Azure Monitor

Azure Monitor 收集的所有数据属于以下两种基本类型之一:指标和日志。All data collected by Azure Monitor fits into one of two fundamental types: metrics and logs.

  1. 指标是数字值,用于描述系统某些方面在特定时间点的情况。Metrics are numerical values that describe some aspect of a system at a particular point in time. 指标是轻型数据,可以支持近实时方案。They are lightweight and capable of supporting near real-time scenarios. 你将在 Azure 门户的 "概述" 页中看到 Azure Monitor 收集的数据。You'll see data collected by Azure Monitor right in their Overview page in the Azure portal.


  1. 日志包含不同类型的已经整理成记录的数据,每种类型都有不同的属性集。Logs contain different kinds of data organized into records with different sets of properties for each type. 与性能数据一样,事件和跟踪等遥测数据也作为日志存储,因此,可将它们合并以进行分析。Telemetry such as events and traces are stored as logs in addition to performance data so that it can all be combined for analysis. 可以使用查询来分析 Azure Monitor 收集的日志数据,这些查询可以快速检索、合并和分析所收集的数据。Log data collected by Azure Monitor can be analyzed with queries to quickly retrieve, consolidate, and analyze collected data. 可以使用 Azure 门户中的 Log Analytics 创建和测试查询,然后可以直接使用这些工具分析数据,或者保存查询以便与可视化效果警报规则配合使用。You can create and test queries using Log Analytics in the Azure portal and then either directly analyze the data using these tools or save queries for use with visualizations or alert rules.

在 Log Analytics 中引入日志的图像

下面将详细介绍如何配置这些警报。We will have more details below on how to configure these alerts.

使用 Windows 管理中心加入群集Onboarding your cluster using Windows Admin Center

使用 Windows 管理中心,你可以将群集加入到 Azure Monitor。Using Windows Admin Center, you can onboard your cluster to Azure Monitor.

要 Azure Monitor 的载入群集的 Gif

在此加入流程中,将在幕后执行以下步骤。During this onboarding flow, the steps below are happening under the hood. 如果要手动设置群集,我们将详细介绍如何配置它们。We detail how to configure them in detail in case you want to manually setup your cluster.

配置运行状况服务Configuring Health Service

首先要做的就是配置群集。The first thing that you need to do is configure your cluster. 你可能也知道,运行状况服务改进了运行存储空间直通的群集的日常监视和操作体验。As you may know, the Health Service improves the day-to-day monitoring and operational experience for clusters running Storage Spaces Direct.

如上所述,Azure Monitor 从群集中运行的每个节点收集日志。As we saw above, Azure Monitor collects logs from each node that it is running on in your cluster. 因此,我们必须将运行状况服务配置为写入事件通道,这恰好是:So, we have to configure the Health Service to write to an event channel, which happens to be:

Event Channel: Microsoft-Windows-Health/Operational
Event ID: 8465

若要配置运行状况服务,请运行:To configure the Health Service, you run:

get-storagesubsystem clus* | Set-StorageHealthSetting -Name "Platform.ETW.MasTypes" -Value "Microsoft.Health.EntityType.Subsystem,Microsoft.Health.EntityType.Server,Microsoft.Health.EntityType.PhysicalDisk,Microsoft.Health.EntityType.StoragePool,Microsoft.Health.EntityType.Volume,Microsoft.Health.EntityType.Cluster"

当你运行上述 cmdlet 来设置运行状况设置时,将导致我们要开始向Microsoft Windows 运行状况/操作事件通道中写入事件。When you run the cmdlet above to set the Health Settings, you cause the events we want to begin being written to the Microsoft-Windows-Health/Operational event channel.

配置 Log AnalyticsConfiguring Log Analytics

在群集上设置正确的日志记录后,下一步就是正确配置 log analytics。Now that you have setup the proper logging on your cluster, the next step is to properly configure log analytics.

为了进行概述, Azure Log Analytics可以将数据从数据中心或其他云环境中的物理或虚拟 Windows 计算机直接收集到单个存储库中,以便进行详细的分析和关联。To give an overview, Azure Log Analytics can collect data directly from your physical or virtual Windows computers in your datacenter or other cloud environment into a single repository for detailed analysis and correlation.

若要了解支持的配置,请查看支持的 Windows 操作系统网络防火墙配置To understand the supported configuration, review supported Windows operating systems and network firewall configuration.

如果没有 Azure 订阅,请在开始之前创建一个免费帐户If you don't have an Azure subscription, create a free account before you begin.

登录到 Azure 门户Login in to Azure Portal

通过 https://portal.azure.com 登录到 Azure 门户。Log in to the Azure portal at https://portal.azure.com.

创建工作区Create a workspace

有关下面列出的步骤的详细信息,请参阅 Azure Monitor 文档For more details on the steps listed below, see the Azure Monitor documentation.

  1. 在 Azure 门户中,单击“所有服务”。In the Azure portal, click All services. 在资源列表中,键入“Log Analytics”。In the list of resources, type Log Analytics. 开始键入时,会根据输入筛选该列表。As you begin typing, the list filters based on your input. 选择“Log Analytics”。Select Log Analytics.

    Azure 门户

  2. 单击“创建”****,然后为以下各项选择选项:Click Create, and then select choices for the following items:

    • 为新的 Log Analytics 工作区**** 提供名称,如 DefaultLAWorkspace**。Provide a name for the new Log Analytics Workspace, such as DefaultLAWorkspace.

    • 如果选择的默认值不合适,请从下拉列表中选择要链接到的订阅Select a Subscription to link to by selecting from the drop-down list if the default selected is not appropriate.

    • 对于“资源组”,选择包含一个或多个 Azure 虚拟机的现有资源组。For Resource Group, select an existing resource group that contains one or more Azure virtual machines.

      创建 Log Analytics 资源边栏选项卡

  3. 在“Log Analytics 工作区”窗格上提供所需信息后,单击“确定” 。After providing the required information on the Log Analytics Workspace pane, click OK.

在验证信息和创建工作区时,可以在菜单中的“通知”下面跟踪操作进度。While the information is verified and the workspace is created, you can track its progress under Notifications from the menu.

获取工作区 ID 和密钥Obtain workspace ID and key

在安装适用于 Windows 的 Microsoft Monitoring Agent 之前,需要先获得 Log Analytics 工作区的工作区 ID 和密钥。Before installing the Microsoft Monitoring Agent for Windows, you need the workspace ID and key for your Log Analytics workspace. 安装向导需要使用此信息来正确配备代理,并确保它能与 Log Analytics 成功通信。This information is required by the setup wizard to properly configure the agent and ensure it can successfully communicate with Log Analytics.

  1. 在 Azure 门户中,单击左上角的“所有服务”****。In the Azure portal, click All services found in the upper left-hand corner. 在资源列表中,键入“Log Analytics”。In the list of resources, type Log Analytics. 开始键入时,会根据输入筛选该列表。As you begin typing, the list filters based on your input. 选择“Log Analytics”。Select Log Analytics.
  2. 在 Log Analytics 工作区列表中,选择之前创建的 DefaultLAWorkspace。In your list of Log Analytics workspaces, select DefaultLAWorkspace created earlier.
  3. 选择“高级设置”。Select Advanced settings.

    Log Analytics 高级设置Log Analytics Advance Settings

  4. 选择“已连接的源”,然后选择“Windows 服务器” 。Select Connected Sources, and then select Windows Servers.
  5. “工作区 ID”和“主密钥”右侧的值**** ****。The value to the right of Workspace ID and Primary Key. 暂时保存两者 - 暂时将两者复制粘贴到你最喜欢的编辑器中。Save both temporarily - copy and paste both into your favorite editor for the time being.

在 Windows 上安装代理Installing the agent on Windows

请按照下面的步骤安装和配置 Microsoft Monitoring Agent。The following steps install and configure the Microsoft Monitoring Agent. 请确保在群集中的每个服务器上安装此代理,并指示你希望在 Windows 启动时运行代理。Be sure to install this agent on each server in your cluster and indicate that you want the agent to run at Windows Startup.

  1. 在“Windows 服务器”页上,选择“下载 Windows 代理”,根据 Windows 操作系统的处理器体系结构下载相应的版本。**** ****On the Windows Servers page, select the appropriate Download Windows Agent version to download depending on the processor architecture of the Windows operating system.
  2. 运行安装程序在计算机上安装该代理。Run Setup to install the agent on your computer.
  3. 在“欢迎”页面上,单击“下一步”。 On the Welcome page, click Next.
  4. 在“许可条款”页面上阅读许可协议,然后单击“我接受” 。On the License Terms page, read the license and then click I Agree.
  5. 在“目标文件夹”页面上更改或保留默认安装文件夹,然后单击“下一步” 。On the Destination Folder page, change or keep the default installation folder and then click Next.
  6. 在“代理安装选项”页上,选择将代理连接到 Azure Log Analytics,单击“下一步”。 On the Agent Setup Options page, choose to connect the agent to Azure Log Analytics and then click Next.
  7. 在“Azure Log Analytics”页上执行以下操作:On the Azure Log Analytics page, perform the following:
    1. 粘贴前面复制的“工作区 ID”和“工作区密钥(主密钥)”。 Paste the Workspace ID and Workspace Key (Primary Key) that you copied earlier. a.a. 如果计算机需要通过代理服务器来与 Log Analytics 通信,请单击“高级”并提供代理服务器的 URL 和端口号。If the computer needs to communicate through a proxy server to the Log Analytics service, click Advanced and provide the URL and port number of the proxy server. 如果代理服务器要求身份验证,请键入用于在代理服务器上进行身份验证的用户名和密码,并单击“下一步”。If your proxy server requires authentication, type the username and password to authenticate with the proxy server and then click Next.
  8. 提供所需的配置设置后,单击“下一步”。Click Next once you have completed providing the necessary configuration settings.

    粘贴工作区 ID 和主键paste Workspace ID and Primary Key

  9. 在“准备安装”页上检查所做的选择,并单击“安装”。 On the Ready to Install page, review your choices and then click Install.
  10. 在“配置已成功完成”页上,单击“完成”。 On the Configuration completed successfully page, click Finish.

完成后,Microsoft Monitoring Agent 将显示在“控制面板”中。When complete, the Microsoft Monitoring Agent appears in Control Panel. 可以检查配置,并验证代理是否已连接到 Log Analytics。You can review your configuration and verify that the agent is connected to Log Analytics. 处于已连接状态时,在“Azure Log Analytics”选项卡上,代理会显示一条消息****:Microsoft Monitoring Agent 已成功连接到 Microsoft Log Analytics 服务。****When connected, on the Azure Log Analytics tab, the agent displays a message stating: The Microsoft Monitoring Agent has successfully connected to the Microsoft Log Analytics service.

MMA 与 Log Analytics 的连接状态

若要了解支持的配置,请查看支持的 Windows 操作系统网络防火墙配置To understand the supported configuration, review supported Windows operating systems and network firewall configuration.

使用 Windows 管理中心设置警报Setting up alerts using Windows Admin Center

在 Windows 管理中心,可以配置将应用于 Log Analytics 工作区中所有服务器的默认警报。In Windows Admin Center, you can configure default alerts that will apply to all servers in your Log Analytics workspace.

设置警报的 Gif

下面列出了可选择加入的警报及其默认条件:These are the alerts and their default conditions that you can opt into:

警报名称Alert Name 默认条件Default Condition
CPU 使用率CPU utilization 持续 10 分钟超过 85%Over 85% for 10 minutes
磁盘容量利用率Disk capacity utilization 持续 10 分钟超过 85%Over 85% for 10 minutes
内存利用率Memory utilization 持续 10 分钟可用内存小于 100 MBAvailable memory less than 100 MB for 10 minutes
检测信号Heartbeat 持续 5 分钟少于 2 个信号Fewer than 2 beats for 5 minutes
系统严重错误System critical error 群集系统事件日志中的任何严重警报Any critical alert in the cluster system event log
运行状况服务警报Health service alert 群集上的任何运行状况服务故障Any health service fault on the cluster

在 Windows 管理中心中配置警报后,你可以在 Azure 中的 log analytics 工作区中查看警报。Once you configure the alerts in Windows Admin Center, you can see the alerts in your log analytics workspace in Azure.

设置警报的 Gif

收集事件和性能数据Collecting event and performance data

Log Analytics 可从 Windows 事件日志以及指定用于长期分析的性能计数器中收集事件,并在检测到特定条件时采取措施。Log Analytics can collect events from the Windows event log and performance counters that you specify for longer term analysis and reporting, and take action when a particular condition is detected. 首先,请按照下列步骤操作,配置 Windows 事件日志以及几个常见性能计数器中收集事件。Follow these steps to configure collection of events from the Windows event log, and several common performance counters to start with.

  1. 在 Azure 门户中,单击左下角的“更多服务”****。In the Azure portal, click More services found on the lower left-hand corner. 在资源列表中,键入“Log Analytics”。In the list of resources, type Log Analytics. 开始键入时,会根据输入筛选该列表。As you begin typing, the list filters based on your input. 选择“Log Analytics”。Select Log Analytics.
  2. 选择“高级设置”。Select Advanced settings.

    Log Analytics 高级设置Log Analytics Advance Settings

  3. 选择“数据”,然后选择“Windows 事件日志”。Select Data, and then select Windows Event Logs.
  4. 在此处,通过键入下面的名称并单击加号“+”来添加运行状况服务事件通道****。Here, add the Health Service event channel by typing in the name below and the click the plus sign +.
    Event Channel: Microsoft-Windows-Health/Operational
  5. 在表中,选中严重性“错误”和“警告”。In the table, check the severities Error and Warning.
  6. 单击页面顶部的“保存”来保存配置。****Click Save at the top of the page to save the configuration.
  7. 选择“Windows 性能计数器”,在 Windows 计算机上启用性能计数器收集。Select Windows Performance Counters to enable collection of performance counters on a Windows computer.
  8. 首次为新的 Log Analytics 工作区配置 Windows 性能计数器时,可以选择快速创建几个通用的计数器。When you first configure Windows Performance counters for a new Log Analytics workspace, you are given the option to quickly create several common counters. 将这些计数器在一个复选框中依次列出。They are listed with a checkbox next to each.
    选中的默认 Windows 性能计数器Default Windows performance counters selected
    单击“添加所选性能计数器”****。Click Add the selected performance counters. 随即会添加它们,并且通过 10 秒收集示例间隔进行预设。They are added and preset with a ten second collection sample interval.
  9. 单击页面顶部的“保存”来保存配置。****Click Save at the top of the page to save the configuration.

基于日志数据创建警报Creating alerts based on log data

如果已完成此项,群集应向 Log Analytics 发送日志和性能计数器。If you've made it this far, your cluster should be sending your logs and performance counters to Log Analytics. 下一步是创建警报规则,以定期自动运行日志搜索。The next step is to create alert rules that automatically run log searches at regular intervals. 如果日志搜索的结果与特定条件匹配,则会触发警报,向你发送电子邮件或文本通知。If results of the log search match particular criteria, then an alert is fired that sends you an email or text notification. 下面我们来探讨这个问题。Let's explore this below.

创建查询Create a query

首先打开日志搜索门户。Start by opening the Log Search portal.

  1. 在 Azure 门户中,单击“所有服务”。In the Azure portal, click All services. 在资源列表中,键入“监视器”****。In the list of resources, type Monitor. 开始键入时,会根据输入筛选该列表。As you begin typing, the list filters based on your input. 选择“监视器”****。Select Monitor.
  2. 在 "监视" 导航菜单上,选择 " Log Analytics ",然后选择一个工作区。On the Monitor navigation menu, select Log Analytics and then select a workspace.

用于检索某些要使用的数据的最快方法是使用一个简单查询,它可返回表中的所有记录。The quickest way to retrieve some data to work with is a simple query that returns all records in table. 在搜索框中键入以下查询,然后单击“搜索”按钮。Type the following queries in the search box and click the search button.


数据会返回到默认列表视图中,并可看到返回的总记录条数。Data is returned in the default list view, and you can see how many total records were returned.


屏幕左侧是“筛选器”窗格,可用于向查询添加筛选而无需直接修改查询。On the left side of the screen is the filter pane which allows you to add filtering to the query without modifying it directly. 该记录类型显示有多个记录属性,可选择一个或多个属性值来缩小搜索结果范围。Several record properties are displayed for that record type, and you can select one or more property values to narrow your search results.

选中“EVENTLEVELNAME”下“错误”旁边的复选框,或键入以下内容将结果限制为错误事件**** ****。Select the checkbox next to Error under EVENTLEVELNAME or type the following to limit the results to error events.

Event | where (EventLevelName == "Error")


对所关注的事件进行 approriate 查询后,请将其保存为下一步。After you have the approriate queries made for events you care about, save them for the next step.

创建警报Create alerts

现在,让我们看一看创建警报的示例。Now, let's walk through an example for creating an alert.

  1. 在 Azure 门户中,单击“所有服务”。In the Azure portal, click All services. 在资源列表中,键入“Log Analytics”。In the list of resources, type Log Analytics. 开始键入时,会根据输入筛选该列表。As you begin typing, the list filters based on your input. 选择“Log Analytics”。Select Log Analytics.

  2. 在左窗格中选择“警报”,然后单击页面顶部的“新建警报规则”,以便创建新的警报。**** ****In the left-hand pane, select Alerts and then click New Alert Rule from the top of the page to create a new alert.

    创建新的警报规则Create new alert rule

  3. 第一步是在“创建警报”部分选择充当资源的 Log Analytics 工作区,**** 因为这是基于日志的警报信号。For the first step, under the Create Alert section, you are going to select your Log Analytics workspace as the resource, since this is a log based alert signal. 对结果进行筛选,方法是:从下拉列表中选择特定的“订阅”(如果有多个),其中包含此前创建的 Log Analytics 工作区****。Filter the results by choosing the specific Subscription from the drop-down list if you have more than one, which contains Log Analytics workspace created earlier. 从下拉列表中选择“Log Analytics”,对“资源类型”进行筛选。**** ****Filter the Resource Type by selecting Log Analytics from the drop-down list. 最后,选择资源 DefaultLAWorkspace,然后单击“完成”。****Finally, select the Resource DefaultLAWorkspace and then click Done.

    创建警报步骤 1 任务Create alert step 1 task

  4. 在“警报条件”部分下,单击“添加条件”,选择保存的查询,然后指定警报规则遵循的逻辑**** ****。Under the section Alert Criteria, click Add Criteria to select your saved query and then specify logic that the alert rule follows.

  5. 使用以下信息配置警报:a.Configure the alert with the following information: a. 从“基于”下拉列表中选择“指标度量”**** ****。From the Based on drop-down list, select Metric measurement. 指标度量将为查询中其值超出指定阈值的每个对象创建一个警报。A metric measurement will create an alert for each object in the query with a value that exceeds our specified threshold. b.b. 对于 "条件",选择 "大于" 并指定 thershold。For the Condition, select Greater than and specify a thershold. c.c. 然后定义触发警报的时间。Then define when to trigger the alert. 例如,可以选择“连续违规”,然后从下拉列表中选择“大于”值 3**** ****。For example you could select Consecutive breaches and from the drop-down list select Greater than a value of 3. d.d. 在“评估条件”部分下,将“期间”值修改为“30”分钟,频率改为“5”**** **** ****。Under Evaluation based on section, modify the Period value to 30 minutes and Frequency to 5. 此规则将每五分钟运行一次,返回从当前时间算起过去 30 分钟内创建的记录。The rule will run every five minutes and return records that were created within the last thirty minutes from the current time. 将时间段设置为更宽的时间窗口可以解决数据延迟的可能性,并确保查询返回数据以避免警报永远不会触发的漏报。Setting the time period to a wider window accounts for the potential of data latency, and ensures the query returns data to avoid a false negative where the alert never fires.

  6. 单击“完成”,完成警报规则。****Click Done to complete the alert rule.

    配置警报信号Configure alert signal

  7. 现在转到第二步,在“警报规则名称”字段中提供警报的名称,例如“所有错误事件的警报”**** ****。Now moving onto the second step, provide a name of your alert in the Alert rule name field, such as Alert on all Error Events. 指定“说明”,详细描述该警报的具体信息,并从提供的选项中选择“关键(严重性 0)”作为“严重性”值。**** **** ****Specify a Description detailing specifics for the alert, and select Critical(Sev 0) for the Severity value from the options provided.

  8. 若要在创建后立即激活警报规则,请接受“创建后启用规则”选项的默认值。****To immediately activate the alert rule on creation, accept the default value for Enable rule upon creation.

  9. 第三步也是最后一步,指定“操作组”****,确保每次触发警报时都执行相同的操作,而且这些操作可以用于定义的每项规则。For the third and final step, you specify an Action Group, which ensures that the same actions are taken each time an alert is triggered and can be used for each rule you define. 使用以下信息配置新操作组:a.Configure a new action group with the following information: a. 选择“新建操作组”,此时会显示“添加操作组”窗格。**** ****Select New action group and the Add action group pane appears. b.b. 对于“操作组名称”****,请指定一个长名称,例如“IT 操作 - 通知”,以及一个“短名称”,例如“itops-n”。**** **** ****For Action group name, specify a name such as IT Operations - Notify and a Short name such as itops-n. c.c. 验证“订阅”和“资源组”的默认值是否正确**** ****。Verify the default values for Subscription and Resource group are correct. 如果否,请从下拉列表中选择正确的值。If not, select the correct one from the drop-down list. d.d. 在“操作”部分指定操作的名称,例如“发送电子邮件”,然后在“操作类型”下的下拉列表中选择“电子邮件/短信/推送/语音”。**** **** ****Under the Actions section, specify a name for the action, such as Send Email and under Action Type select Email/SMS/Push/Voice from the drop-down list. “电子邮件/短信/推送/语音”属性窗格会在右侧打开,其中包含更多的信息。****The Email/SMS/Push/Voice properties pane will open to the right in order to provide additional information. e.e. 电子邮件/SMS/推送/语音窗格上,选择并设置你的首选项。On the Email/SMS/Push/Voice pane, select and setup your preference. 例如,启用“电子邮件”,并提供有效的可以接收邮件的电子邮件 SMTP 地址****。For example, enable Email and provide a valid email SMTP address to deliver the message to. f.f. **** 单击“确定”以保存你的更改。Click OK to save your changes.


  10. 单击“确定”****,完成操作组。Click OK to complete the action group.

  11. 单击“创建警报规则”,完成警报规则。****Click Create alert rule to complete the alert rule. 该警报会立即开始运行。It starts running immediately.

    完成新警报规则的创建Complete creating new alert rule

示例警报Example alert

作为参考,下面提供 Azure 中的示例警报。For reference, this is what an example alert looks like in Azure.

Azure 中的警报 Gif

下面是 Azure Monitor 发送的电子邮件的示例:Below is an example of the email that you will be send by Azure Monitor:


其他参考Additional References