您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure Monitor 中的日志警报Log alerts in Azure Monitor

概述Overview

日志警报是 Azure 警报中支持的警报类型之一。Log alerts are one of the alert types that are supported in Azure Alerts. 日志警报允许用户使用 Log Analytics 查询来评估每个设置频率的资源日志,并根据结果触发警报。Log alerts allow users to use a Log Analytics query to evaluate resources logs every set frequency, and fire an alert based on the results. 规则可使用 操作组触发一个或多个操作。Rules can trigger one or more actions using Action Groups.

备注

可以将 Log Analytics 工作区 中的日志数据发送到 Azure Monitor 指标存储中。Log data from a Log Analytics workspace can be sent to the Azure Monitor metrics store. 指标警报具有不同的行为,该行为可能更可取,具体取决于你要使用的数据。Metrics alerts have different behavior, which may be more desirable depending on the data you are working with. 要了解如何将日志路由到指标,请参阅日志的指标警报For information on what and how you can route logs to metrics, see Metric Alert for Logs.

备注

对于 API 版本 2020-05-01-preview 和以资源为中心的日志警报,当前没有额外的费用。There are currently no additional charges for the API version 2020-05-01-preview and resource centric log alerts. 未来将公布预览版中的功能的定价,以及开始计费之前提供的通知。Pricing for features that are in preview will be announced in the future and a notice provided prior to start of billing. 如果你选择在通知期后继续使用新的 API 版本和以资源为中心的日志警报,则将按适用的费率向你收费。Should you choose to continue using new API version and resource centric log alerts after the notice period, you will be billed at the applicable rate.

必备条件Prerequisites

日志警报对 Log Analytics 的数据运行查询。Log alerts run queries on Log Analytics data. 首先,应开始 收集日志数据 并查询日志数据以查找问题。First you should start collecting log data and query the log data for issues. 您可以使用 Log Analytics 中的 " 警报查询示例" 主题 来了解可以发现或 开始编写您自己的查询的内容。You can use the alert query examples topic in Log Analytics to understand what you can discover or get started on writing your own query.

Azure 监视参与者 是创建、修改和更新日志警报所需的常见角色。Azure Monitoring Contributor is a common role that is needed for creating, modifying, and updating log alerts. 还需要访问资源日志的 & 查询执行权限。Access & query execution rights for the resource logs are also needed. 对资源日志进行部分访问可能导致查询失败或返回部分结果。Partial access to resource logs can fail queries or return partial results. 详细了解如何在 Azure 中配置日志警报Learn more about configuring log alerts in Azure.

备注

使用旧 Log Analytics 警报 API管理 Log Analytics 的日志警报。Log alerts for Log Analytics used to be managed using the legacy Log Analytics Alert API. 详细了解如何切换到当前的 SCHEDULEDQUERYRULES APILearn more about switching to the current ScheduledQueryRules API.

查询评估定义Query evaluation definition

日志搜索规则条件定义开始于:Log search rules condition definition starts from:

  • 要运行哪个查询?What query to run?
  • 如何使用结果?How to use the results?

以下各节介绍了可用于设置上述逻辑的不同参数。The following sections describe the different parameters you can use to set the above logic.

日志查询Log query

用于计算规则的 Log Analytics 查询。The Log Analytics query used to evaluate the rule. 此查询返回的结果用于确定是否触发警报。The results returned by this query are used to determine whether an alert is to be triggered. 查询的作用域可以是:The query can be scoped to:

  • 特定资源,如虚拟机。A specific resource, such as a virtual machine.
  • 大规模资源,如订阅或资源组。An at scale resource, such as a subscription or resource group.
  • 使用 跨资源查询的多个资源。Multiple resources using cross-resource query.

重要

警报查询具有限制,可确保结果的最佳性能和相关性。Alert queries have constraints to ensure optimal performance and the relevance of the results. 在此处了解详细信息Learn more here.

重要

仅支持使用当前的 scheduledQueryRules API 进行资源中心和 跨资源查询Resource centric and cross-resource query are only supported using the current scheduledQueryRules API. 如果使用旧版 Log Analytics 警报 API,则需要切换。If you use the legacy Log Analytics Alert API, you will need to switch. 了解有关切换的详细信息Learn more about switching

查询时间范围Query time Range

在规则条件定义中设置时间范围。Time range is set in the rule condition definition. 在工作区和 Application Insights 中,它称为 PeriodIn workspaces and Application Insights, it's called Period. 在所有其他资源类型中,称为 " 替代查询时间范围"。In all other resource types, it's called Override query time range.

与在 log analytics 中一样,时间范围限制查询数据到指定的范围。Like in log analytics, the time range limits query data to the specified range. 即使查询中使用了 一个命令,也会应用时间范围。Even if ago command is used in the query, the time range will apply.

例如,当时间范围为60分钟时,查询将扫描60分钟,即使文本中 ** (1d) **也是如此。For example, a query scans 60 minutes, when time range is 60 minutes, even if the text contains ago(1d). 时间范围和查询时间筛选需要匹配。The time range and query time filtering need to match. 在此示例中,将 "期间" / 重写查询时间范围更改为一天将按预期方式工作。In the example case, changing the Period / Override query time range to one day, would work as expected.

度量Measure

日志警报将日志转换为可计算的数字值。Log alerts turn log into numeric values that can be evaluated. 可以衡量两个不同的内容:You can measure two different things:

结果表行计数Count of the results table rows

结果计数为默认度量值。Count of results is the default measure. 适用于事件,例如 Windows 事件日志、syslog 和应用程序异常。Ideal for working with events such as Windows event logs, syslog, application exceptions. 在计算的时间范围内发生日志记录或不发生日志记录时触发触发器。Triggers when log records happen or doesn't happen in the evaluated time window.

当你尝试在日志中检测数据时,日志警报最有效。Log alerts work best when you try to detect data in the log. 当你尝试检测日志中缺少的数据时,它的工作效率会降低。It works less well when you try to detect lack of data in the logs. 例如,对虚拟机检测信号发出警报。For example, alerting on virtual machine heartbeat.

对于工作区和 Application Insights,将 基于 所选的 结果数来调用它。For workspaces and Application Insights, it's called Based on with selection Number of results. 在所有其他资源类型中,它被称为具有选择表行度量值In all other resource types, it's called Measure with selection Table rows.

备注

由于日志是半结构化数据,因此,它们本质上比指标更具延迟,尝试检测日志中缺少数据时可能会遇到 misfires,应考虑使用 指标警报Since logs are semi-structured data, they are inherently more latent than metric, you may experience misfires when trying to detect lack of data in the logs, and you should consider using metric alerts. 可以使用 日志的指标警报,将数据从日志发送到指标存储。You can send data to the metric store from logs using metric alerts for logs.

结果表行计数用例示例Example of results table rows count use case

如果你想要了解应用程序的响应时间,错误代码 500 (内部服务器错误) 。You want to know when your application responded with error code 500 (Internal Server Error). 可以创建一个警报规则,详情如下:You would create an alert rule with the following details:

  • 查询:Query:
requests
| where resultCode == "500"
  • 时间段: 15 分钟Time period: 15 minutes
  • 警报频率: 15 分钟Alert frequency: 15 minutes
  • 阈值: 大于 0Threshold value: Greater than 0

然后,警报规则监视所有以500错误代码结尾的请求。Then alert rules monitors for any requests ending with 500 error code. 查询每15分钟运行一次,过去15分钟。The query runs every 15 minutes, over the last 15 minutes. 如果只找到一个记录,则会触发警报并触发所配置的操作。If even one record is found, it fires the alert and triggers the actions configured.

基于数值列计算度量值 (例如 CPU 计数器值) Calculation of measure based on a numeric column (such as CPU counter value)

对于工作区和 Application Insights,它将 基于 选择 指标度量值进行调用。For workspaces and Application Insights, it's called Based on with selection Metric measurement. 在所有其他资源类型中,它被称为 " 度量值 ",并选择任意数量的列名称。In all other resource types, it's called Measure with selection of any number column name.

聚合类型Aggregation type

在多个记录上完成的用于将其聚合为一个数值的计算。The calculation that is done on multiple records to aggregate them to one numeric value. 例如:For example:

  • Count 返回查询中的记录数Count returns the number of records in the query
  • Average 返回定义的度量值列 聚合粒度 的平均值。Average returns the average of the measure column Aggregation granularity defined.

在工作区和 Application Insights 中,仅在 指标度量 度量值类型中受支持。In workspaces and Application Insights, it's supported only in Metric measurement measure type. 查询结果必须包含一个名为 AggregatedValue 的列,该列在用户定义的聚合之后提供数值。The query result must contain a column called AggregatedValue that provide a numeric value after a user-defined aggregation. 在所有其他资源类型中," 聚合类型 " 是从该名称的字段中选择的。In all other resource types, Aggregation type is selected from the field of that name.

聚合粒度Aggregation granularity

确定用于聚合多个记录到一个数值的间隔。Determines the interval that is used to aggregate multiple records to one numeric value. 例如,如果指定 5 分钟,则将使用指定的 聚合类型 按5分钟间隔对记录进行分组。For example, if you specified 5 minutes, records would be grouped by 5-minute intervals using the Aggregation type specified.

在工作区和 Application Insights 中,仅在 指标度量 度量值类型中受支持。In workspaces and Application Insights, it's supported only in Metric measurement measure type. 查询结果必须包含在查询结果中设置时间间隔 ( # B1 的 bin The query result must contain bin() that sets interval in the query results. 在所有其他资源类型中,控制此设置的字段称为 聚合粒度In all other resource types, the field that controls this setting is called Aggregation granularity.

备注

如果 bin ( # B1 可能导致不一致的时间间隔,则在运行时,警报服务将自动将 Bin ( # B3 函数转换为 bin_at ( # B5 函数,以确保具有固定点的结果。As bin() can result in uneven time intervals, the alert service will automatically convert bin() function to bin_at() function with appropriate time at runtime, to ensure results with a fixed point.

按警报维度拆分Split by alert dimensions

按数字或字符串列将警报拆分为单独的警报,方法是将其分组为唯一的组合。Split alerts by number or string columns into separate alerts by grouping into unique combinations. 以规模 (订阅或资源组范围) 创建以资源为中心的警报时,可以按 Azure 资源 ID 列进行拆分。When creating resource-centric alerts at scale (subscription or resource group scope), you can split by Azure resource ID column. "对 Azure 资源 ID 进行拆分" 列会将警报的目标更改为指定的资源。Splitting on Azure resource ID column will change the target of the alert to the specified resource.

在工作区和 Application Insights 中,仅在 指标度量 度量值类型中受支持。In workspaces and Application Insights, it's supported only in Metric measurement measure type. 字段 在上被称为聚合。The field is called Aggregate On. 它限制为三列。It's limited to three columns. 查询中的列数超过三个可能导致意外的结果。Having more than three groups by columns in the query could lead to unexpected results. 在所有其他资源类型中,它在条件 (限制为六个分隔) 的情况下,配置为 " 按维度拆分 " 部分。In all other resource types, it's configured in Split by dimensions section of the condition (limited to six splits).

按警报维度拆分的示例Example of splitting by alert dimensions

例如,你想要监视在特定资源组中运行你的网站/应用的多个虚拟机的错误。For example, you want to monitor errors for multiple virtual machines running your web site/app in a specific resource group. 可以使用日志警报规则执行此操作,如下所示:You can do that using a log alert rule as follows:

  • 查询:Query:

    // Reported errors
    union Event, Syslog // Event table stores Windows event records, Syslog stores Linux records
    | where EventLevelName == "Error" // EventLevelName is used in the Event (Windows) records
    or SeverityLevel== "err" // SeverityLevel is used in Syslog (Linux) records
    

    将工作区和 Application Insights 与 指标度量 警报逻辑一起使用时,需要将此行添加到查询文本:When using workspaces and Application Insights with Metric measurement alert logic, this line needs to be added to the query text:

    | summarize AggregatedValue = count() by Computer, bin(TimeGenerated, 15m)
    
  • 资源 Id 列: 警报规则中按资源 id 列划分的 _ResourceId (仅适用于当前) 的订阅和资源组Resource ID Column: _ResourceId (Splitting by resource ID column in alert rules is only available for subscriptions and resource groups currently)

  • 维度/聚合依据:Dimensions / Aggregated on:

    • 计算机 = VM1,VM2 (警报规则定义中的筛选值当前不适用于工作区和 Application Insights。Computer = VM1, VM2 (Filtering values in alert rules definition isn't available currently for workspaces and Application Insights. 筛选查询文本。 ) Filter in the query text.)
  • 时间段: 15 分钟Time period: 15 minutes

  • 警报频率: 15 分钟Alert frequency: 15 minutes

  • 阈值: 大于 0Threshold value: Greater than 0

此规则监视在过去15分钟内虚拟机是否有错误事件。This rule monitors if any virtual machine had error events in the last 15 minutes. 将单独监视每个虚拟机,并分别触发操作。Each virtual machine is monitored separately and will trigger actions individually.

备注

按警报维度分割仅适用于当前 scheduledQueryRules API。Split by alert dimensions is only available for the current scheduledQueryRules API. 如果使用旧版 Log Analytics 警报 API,则需要切换。If you use the legacy Log Analytics Alert API, you will need to switch. 了解有关切换的详细信息Learn more about switching. 仅在 API 版本及更高版本中支持规模为资源中心的警报 2020-05-01-previewResource centric alerting at scale is only supported in the API version 2020-05-01-preview and above.

警报逻辑定义Alert logic definition

定义要运行的查询并对结果进行评估后,需要定义警报逻辑以及何时触发操作。Once you define the query to run and evaluation of the results, you need to define the alerting logic and when to fire actions. 以下各节介绍了可以使用的不同参数:The following sections describe the different parameters you can use:

阈值和运算符Threshold and operator

查询结果将转换为一个数字,该数字将与阈值和运算符进行比较。The query results are transformed into a number that is compared against the threshold and operator.

频率Frequency

查询运行的时间间隔。The interval in which the query is run. 可以设置为5分钟到一天。Can be set from 5 minutes to one day. 必须等于或小于 查询时间范围 ,才能不错过日志记录。Must be equal to or less than the query time range to not miss log records.

例如,如果将时间段设置为30分钟,频率设置为1小时。For example, if you set the time period to 30 minutes and frequency to 1 hour. 如果查询在00:00 运行,则它将返回23:30 和00:00 之间的记录。If the query is run at 00:00, it returns records between 23:30 and 00:00. 下一次运行查询时,将在00:30 到01:00 之间返回记录01:00。The next time the query would run is 01:00 that would return records between 00:30 and 01:00. 在00:00 和00:30 之间创建的任何记录永远都不会进行评估。Any records created between 00:00 and 00:30 would never be evaluated.

触发警报的冲突数Number of violations to trigger alert

可以指定触发警报所需的警报评估期和失败次数。You can specify the alert evaluation period and the number of failures needed to trigger an alert. 允许您更好地定义触发警报的影响时间。Allowing you to better define an impact time to trigger an alert.

例如,如果你的规则 聚合粒度 定义为 "5 分钟",则仅当发生三次故障 (15 分钟) 上一小时后,才能触发警报。For example, if your rule Aggregation granularity is defined as '5 minutes', you can trigger an alert only if three failures (15 minutes) of the last hour occurred. 此设置由您的应用程序业务策略定义。This setting is defined by your application business policy.

状态和解决警报State and resolving alerts

日志警报是无状态的。Log alerts are stateless. 每次满足条件时,都会触发警报,即使之前已激发也是如此。Alerts fire each time the condition is met, even if fired previously. 触发的警报不会解析。Fired alerts don't resolve. 你可以 将警报标记为已关闭You can mark the alert as closed. 你还可以对操作进行静音,以防它们在触发警报规则后触发一段时间。You can also mute actions to prevent them from triggering for a period after an alert rule fired.

在工作区和 Application Insights 中,这称为 禁止显示警报In workspaces and Application Insights, it's called Suppress Alerts. 在所有其他资源类型中,这称为 " 静音操作"。In all other resource types, it's called Mute Actions.

请参阅此警报评估示例:See this alert evaluation example:

时间Time 日志条件评估Log condition evaluation 结果Result
00:0500:05 falseFALSE 不会触发警报。Alert doesn't fire. 没有调用任何操作。No actions called.
00:1000:10 trueTRUE 警报触发,操作组被调用。Alert fires and action groups called. 新的警报状态处于活动状态。New alert state ACTIVE.
00:1500:15 trueTRUE 警报触发,操作组被调用。Alert fires and action groups called. 新的警报状态处于活动状态。New alert state ACTIVE.
00:2000:20 falseFALSE 不会触发警报。Alert doesn't fire. 没有调用任何操作。No actions called. 早期警报状态保持活动状态。Pervious alerts state remains ACTIVE.

日志警报的定价和计费Pricing and billing of log alerts

定价信息位于 Azure Monitor 定价页中。Pricing information is located in the Azure Monitor pricing page. "资源提供程序" 下列出 microsoft.insights/scheduledqueryrules 了日志警报:Log Alerts are listed under resource provider microsoft.insights/scheduledqueryrules with:

  • Application Insights 上显示的日志警报,其中包含与资源组和警报属性完全相同的资源名称。Log Alerts on Application Insights shown with exact resource name along with resource group and alert properties.
  • 与资源组和警报属性一起显示的 Log Analytics 上显示的日志警报当使用 SCHEDULEDQUERYRULES API创建时。Log Alerts on Log Analytics shown with exact resource name along with resource group and alert properties; when created using scheduledQueryRules API.
  • 旧 LOG ANALYTICS API 创建的日志警报不跟踪 Azure 资源 ,并且不会强制实施唯一资源名称。Log alerts created from legacy Log Analytics API aren't tracked Azure Resources and don't have enforced unique resource names. 这些警报仍在中 microsoft.insights/scheduledqueryrules 作为隐藏资源创建,这些资源具有此资源命名结构 <WorkspaceName>|<savedSearchId>|<scheduleId>|<ActionId>These alerts are still created on microsoft.insights/scheduledqueryrules as hidden resources, which have this resource naming structure <WorkspaceName>|<savedSearchId>|<scheduleId>|<ActionId>. 旧版 API 上的日志警报与 "资源组" 和 "警报属性" 一起显示在一起。Log Alerts on legacy API are shown with above hidden resource name along with resource group and alert properties.

备注

不受支持的资源字符(如) <, >, %, &, \, ?, / _ 在隐藏资源名称中被替换为,这也会在计费信息中反映出来。Unsupported resource characters such as <, >, %, &, \, ?, / are replaced with _ in the hidden resource names and this will also reflect in the billing information.

备注

Log Analytics 的日志警报,该警报用于通过旧 Log Analytics 警报 API 和旧模板 Log Analytics 保存的搜索和警报进行管理。Log alerts for Log Analytics used to be managed using the legacy Log Analytics Alert API and legacy templates of Log Analytics saved searches and alerts. 详细了解如何切换到当前的 SCHEDULEDQUERYRULES APILearn more about switching to the current ScheduledQueryRules API. 任何警报规则管理都应该使用 旧的 LOG ANALYTICS API 完成,直到您决定切换,而不能使用隐藏的资源。Any alert rule management should be done using legacy Log Analytics API until you decide to switch and you can't use the hidden resources.

后续步骤Next steps