您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

优化日志警报查询Optimizing log alert queries

本文介绍如何编写和转换 日志警报 查询以实现最佳性能。This article describes how to write and convert Log Alert queries to achieve optimal performance. 优化的查询可减少延迟并加载频繁运行的警报。Optimized queries reduce latency and load of alerts, which run frequently.

如何开始编写警报日志查询How to start writing an alert log query

警报查询从查询用于指示问题的 Log Analytics 中的日志数据 开始。Alert queries start from querying the log data in Log Analytics that indicates the issue. 您可以使用 " 警报查询示例" 主题 来了解可以发现的内容。You can use the alert query examples topic to understand what you can discover. 您还可以 开始编写您自己的查询You may also get started on writing your own query.

指示问题的查询,而不是警报Queries that indicate the issue and not the alert

生成警报流以将指示问题的结果转换为警报。The alert flow was built to transform the results that indicate the issue to an alert. 例如,在以下情况下,查询如下所示:For example, in a case of a query like:

SecurityEvent
| where EventID == 4624

如果用户的目的是发出警报,则当此事件类型发生时,警报逻辑将追加 count 到查询。If the intent of the user is to alert, when this event type happens, the alerting logic appends count to the query. 将运行的查询将是:The query that will run will be:

SecurityEvent
| where EventID == 4624
| count

无需向查询添加警报逻辑,甚至可能会导致问题。There's no need to add alerting logic to the query and doing that may even cause issues. 在上面的示例中,如果您 count 在查询中包括,它将始终生成值1,因为警报服务将执行的操作 count countIn the above example, if you include count in your query, it will always result in the value 1, since the alert service will do count of count.

避免 limittake 运算符Avoid limit and take operators

limit take 在查询中使用和可能会增加延迟并加载警报,因为结果不会随时间而保持一致。Using limit and take in queries can increase latency and load of alerts as the results aren't consistent over time. 仅在需要时才使用它。It's preferred you use it only if needed.

日志查询约束Log query constraints

Azure Monitor 中的日志查询 以表、 searchunion 运算符开头。Log queries in Azure Monitor start with either a table, search, or union operator.

日志警报规则的查询应始终以表开头来定义明确的作用域,从而提高查询性能和结果的相关性。Queries for log alert rules should always start with a table to define a clear scope, which improves both query performance and the relevance of the results. 警报规则中的查询经常运行,因此使用 searchunion 可能会导致额外的开销,从而增加警报的延迟,因为它需要跨多个表进行扫描。Queries in alert rules run frequently, so using search and union can result in excessive overhead adding latency to the alert, as it requires scanning across multiple tables. 这些运算符还降低了警报服务优化查询的能力。These operators also reduce the ability of the alerting service to optimize the query.

我们不支持创建或修改使用 or 运算符的日志警报规则 search union ,而需要进行跨资源查询。We don't support creating or modifying log alert rules that use search or union operators, expect for cross-resource queries.

例如,以下警报查询的作用域限定为 " SecurityEvent " 表,并搜索特定的事件 ID。For example, the following alerting query is scoped to the SecurityEvent table and searches for specific event ID. 它是查询必须处理的唯一表。It's the only table that the query must process.

SecurityEvent
| where EventID == 4624

使用 跨资源查询 的日志警报规则不受此更改的影响,因为跨资源查询使用类型 union (它将查询范围限制为特定资源)。Log alert rules using cross-resource queries are not affected by this change since cross-resource queries use a type of union, which limits the query scope to specific resources. 下面的示例是有效的日志警报查询:The following example would be valid log alert query:

union
app('Contoso-app1').requests,
app('Contoso-app2').requests,
workspace('Contoso-workspace1').Perf 

备注

新的SCHEDULEDQUERYRULES API支持跨资源查询Cross-resource queries are supported in the new scheduledQueryRules API. 如果仍使用 旧版 Log Analytics 警报 API 来创建日志警报,可以在 此处了解切换。If you still use the legacy Log Analytics Alert API for creating log alerts, you can learn about switching here.

示例Examples

下面的示例包括使用和的日志查询, search union 并提供可用于修改这些查询以在警报规则中使用的步骤。The following examples include log queries that use search and union and provide steps you can use to modify these queries for use in alert rules.

示例 1Example 1

您希望使用以下使用检索性能信息的查询创建日志警报规则 searchYou want to create a log alert rule using the following query that retrieves performance information using search:

search *
| where Type == 'Perf' and CounterName == '% Free Space'
| where CounterValue < 30

若要修改此查询,请首先使用以下查询来标识属性所属的表:To modify this query, start by using the following query to identify the table that the properties belong to:

search *
| where CounterName == '% Free Space'
| summarize by $table

此查询结果将显示 CounterName 属性来自 Perf 表 。The result of this query would show that the CounterName property came from the Perf table.

您可以使用此结果创建将用于警报规则的以下查询:You can use this result to create the following query that you would use for the alert rule:

Perf
| where CounterName == '% Free Space'
| where CounterValue < 30

示例 2Example 2

您希望使用以下使用检索性能信息的查询创建日志警报规则 searchYou want to create a log alert rule using the following query that retrieves performance information using search:

search ObjectName =="Memory" and CounterName=="% Committed Bytes In Use"
| summarize Avg_Memory_Usage =avg(CounterValue) by Computer
| where Avg_Memory_Usage between(90 .. 95)  

若要修改此查询,请首先使用以下查询来标识属性所属的表:To modify this query, start by using the following query to identify the table that the properties belong to:

search ObjectName=="Memory" and CounterName=="% Committed Bytes In Use"
| summarize by $table

此查询结果将显示 ObjectName 和 CounterName 属性来自 Perf 表 。The result of this query would show that the ObjectName and CounterName property came from the Perf table.

您可以使用此结果创建将用于警报规则的以下查询:You can use this result to create the following query that you would use for the alert rule:

Perf
| where ObjectName =="Memory" and CounterName=="% Committed Bytes In Use"
| summarize Avg_Memory_Usage=avg(CounterValue) by Computer
| where Avg_Memory_Usage between(90 .. 95)

示例 3Example 3

要使用以下查询创建日志警报规则,该查询使用 searchunion 来检索性能信息:You want to create a log alert rule using the following query that uses both search and union to retrieve performance information:

search (ObjectName == "Processor" and CounterName == "% Idle Time" and InstanceName == "_Total")
| where Computer !in (
    union *
    | where CounterName == "% Processor Utility"
    | summarize by Computer)
| summarize Avg_Idle_Time = avg(CounterValue) by Computer

若要修改此查询,请首先使用以下查询来标识查询第一部分中属性所属的表:To modify this query, start by using the following query to identify the table that the properties in the first part of the query belong to:

search (ObjectName == "Processor" and CounterName == "% Idle Time" and InstanceName == "_Total")
| summarize by $table

此查询结果将显示所有这些属性来自 Perf 表。The result of this query would show that all these properties came from the Perf table.

现在,将 unionwithsource 命令配合使用,确定哪个源表提供了每行。Now use union with withsource command to identify which source table has contributed each row.

union withsource=table *
| where CounterName == "% Processor Utility"
| summarize by table

此查询结果将显示这些属性也来自 Perf 表。The result of this query would show that these properties also came from the Perf table.

您可以使用这些结果来创建用于预警规则的以下查询:You can use these results to create the following query that you would use for the alert rule:

Perf
| where ObjectName == "Processor" and CounterName == "% Idle Time" and InstanceName == "_Total"
| where Computer !in (
    (Perf
    | where CounterName == "% Processor Utility"
    | summarize by Computer))
| summarize Avg_Idle_Time = avg(CounterValue) by Computer

示例 4Example 4

您希望使用联接两个查询结果的以下查询创建日志警报规则 searchYou want to create a log alert rule using the following query that joins the results of two search queries:

search Type == 'SecurityEvent' and EventID == '4625'
| summarize by Computer, Hour = bin(TimeGenerated, 1h)
| join kind = leftouter (
    search in (Heartbeat) OSType == 'Windows'
    | summarize arg_max(TimeGenerated, Computer) by Computer , Hour = bin(TimeGenerated, 1h)
    | project Hour , Computer
) on Hour

若要修改此查询,请首先使用以下查询来标识包含左侧联接中属性的表:To modify the query, start by using the following query to identify the table that contains the properties in the left side of the join:

search Type == 'SecurityEvent' and EventID == '4625'
| summarize by $table

结果指示左侧联接中的属性属于 SecurityEvent 表。The result indicates that the properties in the left side of the join belong to SecurityEvent table.

现在使用以下查询来标识包含右侧联接中属性的表:Now use the following query to identify the table that contains the properties in the right side of the join:

search in (Heartbeat) OSType == 'Windows'
| summarize by $table

结果指示联接右侧的属性属于 检测信号 表。The result indicates that the properties in the right side of the join belong to Heartbeat table.

您可以使用这些结果来创建用于预警规则的以下查询:You can use these results to create the following query that you would use for the alert rule:

SecurityEvent
| where EventID == '4625'
| summarize by Computer, Hour = bin(TimeGenerated, 1h)
| join kind = leftouter (
    Heartbeat
    | where OSType == 'Windows'
    | summarize arg_max(TimeGenerated, Computer) by Computer , Hour = bin(TimeGenerated, 1h)
    | project Hour , Computer
) on Hour

后续步骤Next steps