您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure 数据资源管理器的编写查询Write queries for Azure Data Explorer

在本文中,你将了解如何在 Azure 数据资源管理器中使用查询语言通过最常见的运算符来执行基本查询。In this article, you learn how to use the query language in Azure Data Explorer to perform basic queries with the most common operators. 此外,还可以接触到一些更高级的语言功能。You also get exposure to some of the more advanced features of the language.

先决条件Prerequisites

可以通过以下两种方式之一运行本文中的查询:You can run the queries in this article in one of two ways:

查询语言概述Overview of the query language

Azure 数据资源管理器中的查询是处理数据并返回结果的只读请求。A query in Azure Data Explorer is a read-only request to process data and return results. 该请求用纯文本形式表示,使用的数据流模型旨在使语法更易于阅读、创作和自动执行。The request is stated in plain text, using a data-flow model designed to make the syntax easy to read, author, and automate. 该查询使用以类似于 SQL 的层次结构组织的架构实体:数据库、表和列。The query uses schema entities that are organized in a hierarchy similar to SQL: databases, tables, and columns.

该查询包含一系列由分号 (;) 分隔的查询语句,至少有一个语句是表格表达式语句,该语句可以生成按列和行的表式网格排列的数据。The query consists of a sequence of query statements, delimited by a semicolon (;), with at least one statement being a tabular expression statement, which is a statement that produces data arranged in a table-like mesh of columns and rows. 查询的表格表达式语句生成查询结果。The query's tabular expression statements produce the results of the query.

表格表达语句的语法具有从一个表格查询运算符到另一个表格查询运算符的表格数据流,从数据源(例如,数据库中的一个表,或产生数据的运算符)开始,然后流经一组数据转换运算符,这些运算符通过使用竖线 (|) 分隔符绑定在一起。The syntax of the tabular expression statement has tabular data flow from one tabular query operator to another, starting with data source (for example, a table in a database, or an operator that produces data) and then flowing through a set of data transformation operators that are bound together through the use of the pipe (|) delimiter.

例如,下面的查询有一个单独的语句,它是一个表格表达式语句。For example, the following query has a single statement, which is a tabular expression statement. 该语句从对名为 StormEvents 的表引用开始(宿主此表的数据库在此是隐式的,并且是连接信息的一部分)。The statement starts with a reference to a table called StormEvents (the database that host this table is implicit here, and part of the connection information). 然后,该表的数据(行)依次由 StartTime 列的值和 State 列的值进行筛选。The data (rows) for that table are then filtered by the value of the StartTime column, and then filtered by the value of the State column. 然后查询返回“幸存”的行计数。The query then returns the count of "surviving" rows.

[ 单击以运行查询 ][Click to run query]

StormEvents
| where StartTime >= datetime(2007-11-01) and StartTime < datetime(2007-12-01)
| where State == "FLORIDA"  
| count

在此情况下,结果为:In this case, the result is:

计数Count
2323

有关详细信息,请参阅查询语言参考For more information see the Query language reference.

最常见的运算符Most common operators

本部分介绍的运算符是在 Azure 数据资源管理器中了解查询的构建基块。The operators covered in this section are the building blocks to understanding queries in Azure Data Explorer. 你编写的大多数查询将包含其中一些运算符。Most queries you write will include several of these operators.

若要在帮助群集上运行查询:选择每个查询上方的“单击以运行查询”。To run queries on the help cluster: select Click to run query above each query.

若要在自己的群集上运行查询:To run queries on your own cluster:

  1. 将每个查询复制到基于 Web 的查询应用程序,然后选择查询,或将光标置于查询中。Copy each query into the web-based query application, and then either select the query or place your cursor in the query.

  2. 在应用程序顶部,选择“运行”。At the top of the application, select Run.

countcount

count:返回表中的行数。count: Returns the count of rows in the table.

以下查询返回 StormEvents 表中的行数。The following query returns the count of rows in the StormEvents table.

[ 单击以运行查询 ][Click to run query]

StormEvents | count

taketake

take:返回最多指定数量的数据行。take: Returns up to the specified number of rows of data.

以下查询从 StormEvents 表返回 5 行。The following query returns five rows from the StormEvents table. 关键字“limit”是“take”的别名。The keyword limit is an alias for take.

[ 单击以运行查询 ][Click to run query]

StormEvents | take 5

提示

除非对源数据进行排序,否则无法保证返回的记录。There is no guarantee which records are returned unless the source data is sorted.

projectproject

project:选择列的子集。project: Selects a subset of columns.

以下查询将返回一组特定的列。The following query returns a specific set of columns.

[ 单击以运行查询 ][Click to run query]

StormEvents
| take 5
| project StartTime, EndTime, State, EventType, DamageProperty, EpisodeNarrative

其中where

where:筛选表,获取满足谓词的行子集。where: Filters a table to the subset of rows that satisfy a predicate.

以下查询依据 EventTypeState 来筛选数据。The following query filters the data by EventType and State.

[ 单击以运行查询 ][Click to run query]

StormEvents
| where EventType == 'Flood' and State == 'WASHINGTON'
| take 5
| project StartTime, EndTime, State, EventType, DamageProperty, EpisodeNarrative

sortsort

sort:按照一个或多个列的顺序对输入表的行排序。sort: Sort the rows of the input table into order by one or more columns.

以下查询依据 DamageProperty 按降序对数据进行排序。The following query sorts the data in descending order by DamageProperty.

[ 单击以运行查询 ][Click to run query]

StormEvents
| where EventType == 'Flood' and State == 'WASHINGTON'
| sort by DamageProperty desc
| take 5
| project StartTime, EndTime, State, EventType, DamageProperty, EpisodeNarrative

备注

运算顺序非常重要。The order of operations is important. 请尝试将 take 5 置于 sort by 前面。Try putting take 5 before sort by. 是否获取了不同的结果?Do you get different results?

toptop

top:返回按指定列排序的前 N 列。top: Returns the first N records sorted by the specified columns.

以下查询返回与上面相同的结果,并减少了一个运算符。The following query returns the same results as above with one less operator.

[ 单击以运行查询 ][Click to run query]

StormEvents
| where EventType == 'Flood' and State == 'WASHINGTON'
| top 5 by DamageProperty desc
| project StartTime, EndTime, State, EventType, DamageProperty, EpisodeNarrative

extendextend

extend:计算派生的列。extend: Computes derived columns.

以下查询通过计算每行中的值来创建新列。The following query creates a new column by computing a value in every row.

[ 单击以运行查询 ][Click to run query]

StormEvents
| where EventType == 'Flood' and State == 'WASHINGTON'
| top 5 by DamageProperty desc
| extend Duration = EndTime - StartTime
| project StartTime, EndTime, Duration, State, EventType, DamageProperty, EpisodeNarrative

表达式可包含所有常用运算符(+、-、*、/、%),且含有一系列可调用的有用函数。Expressions can include all the usual operators (+, -, *, /, %), and there's a range of useful functions that you can call.

summarizesummarize

summarize:聚合行组。summarize: Aggregates groups of rows.

以下查询返回按 State 统计的事件计数。The following query returns the count of events by State.

[ 单击以运行查询 ][Click to run query]

StormEvents
| summarize event_count = count() by State

summarize 运算符将在 by 子句中拥有相同值的行组合在一起,然后使用聚合函数(如 count)将每个组合并到单个行。The summarize operator groups together rows that have the same values in the by clause, and then uses the aggregation function (such as count) to combine each group into a single row. 因此,在此情况下,每个州都有相应的行,还有一个列,用来表示该州的行计数。So, in this case, there's a row for each state, and a column for the count of rows in that state.

有一系列聚合函数,并且可以在一个 summarize 运算符中使用其中多个函数,以生成多个计算列。There's a range of aggregation functions, and you can use several of them in one summarize operator to produce several computed columns. 例如,可以获取每个州的风暴计数,以及每个州的唯一风暴数,然后使用 top 来获取受风暴影响最大的州。For example, you could get the count of storms in each state and the unique number of storms per state, then use top to get the most storm-affected states.

[ 单击以运行查询 ][Click to run query]

StormEvents
| summarize StormCount = count(), TypeOfStorms = dcount(EventType) by State
| top 5 by StormCount desc

summarize 运算的结果是:The result of a summarize operation has:

  • 以 by 命名的各列Each column named in by

  • 每个计算表达式相应的列A column for each computed expression

  • 每个 by 值组合相应的行A row for each combination of by values

renderrender

render:以图形输出的形式呈现结果。render: Renders results as a graphical output.

以下查询显示柱形图。The following query displays a column chart.

[ 单击以运行查询 ][Click to run query]

StormEvents
| summarize event_count=count(), mid = avg(BeginLat) by State
| sort by mid
| where event_count > 1800
| project State, event_count
| render columnchart

以下查询显示简单的时间表。The following query displays a simple time chart.

[ 单击以运行查询 ][Click to run query]

StormEvents
| summarize event_count=count() by bin(StartTime, 1d)
| render timechart

以下查询根据一天的时间取模(量化到小时)对事件进行计数,并显示一个时间图表。The following query counts events by the time modulo one day, binned into hours, and displays a time chart.

[ 单击以运行查询 ][Click to run query]

StormEvents
| extend hour = floor(StartTime % 1d , 1h)
| summarize event_count=count() by hour
| sort by hour asc
| render timechart

以下查询比较时间图表上的多个每日系列。The following query compares multiple daily series on a time chart.

[ 单击以运行查询 ][Click to run query]

StormEvents
| extend hour= floor( StartTime % 1d , 1h)
| where State in ("GULF OF MEXICO","MAINE","VIRGINIA","WISCONSIN","NORTH DAKOTA","NEW JERSEY","OREGON")
| summarize event_count=count() by hour, State
| render timechart

备注

render 运算符为客户端功能,而不是引擎的一部分。The render operator is a client-side feature rather than part of the engine. 它被集成到语言以方便使用。It's integrated into the language for ease of use. Web 应用程序支持以下选项:条形图、柱形图、饼图、时间表和线形图。The web application supports the following options: barchart, columnchart, piechart, timechart, and linechart.

标量运算符Scalar operators

本部分介绍了一些最重要的标量运算符。This section covers some of the most important scalar operators.

bin()bin()

bin() :将值向下舍入到给定 bin 大小的整数倍。bin(): Rounds values down to an integer multiple of a given bin size.

以下查询计算一天存储桶大小的计数。The following query calculates the count with a bucket size of one day.

[ 单击以运行查询 ][Click to run query]

StormEvents
| where StartTime > datetime(2007-02-14) and StartTime < datetime(2007-02-21)
| summarize event_count = count() by bin(StartTime, 1d)

case()case()

case():计算谓词列表,并返回第一个满足其谓词的结果表达式,或最终的 else 表达式。case(): Evaluates a list of predicates, and returns the first result expression whose predicate is satisfied, or the final else expression. 此运算符可用于分类或分组数据:You can use this operator to categorize or group data:

以下查询将返回一个新列 deaths_bucket,并按编号对死亡进行分组。The following query returns a new column deaths_bucket and groups the deaths by number.

[ 单击以运行查询 ][Click to run query]

StormEvents
| summarize deaths = sum(DeathsDirect) by State
| extend deaths_bucket = case (
    deaths > 50, "large",
    deaths > 10, "medium",
    deaths > 0, "small",
    "N/A")
| sort by State asc

extract()extract()

extract() :从文本字符串中获取正则表达式的匹配项。extract(): Gets a match for a regular expression from a text string.

以下查询从跟踪中提取特定属性值。The following query extracts specific attribute values from a trace.

[ 单击以运行查询 ][Click to run query]

let MyData = datatable (Trace: string) ["A=1, B=2, Duration=123.45,...", "A=1, B=5, Duration=55.256, ..."];
MyData
| extend Duration = extract("Duration=([0-9.]+)", 1, Trace, typeof(real)) * time(1s)

此查询使用 let 语句,将名称(在此情况下为 MyData)绑定到表达式。This query uses a let statement, which binds a name (in this case MyData) to an expression. 对于范围中的其余部分,其中出现 let 语句(全局范围或在函数正文范围内),可以使用名称来引用其绑定值。For the rest of the scope, in which the let statement appears (global scope or in a function body scope), the name can be used to refer to its bound value.

parse_json()parse_json()

parse_json() :将字符串解释为 JSON 值并以动态方式返回值。parse_json(): Interprets a string as a JSON value, and returns the value as dynamic. 当需要提取 JSON 复合对象的多个元素时,使用它比使用 extractjson() 函数更好。It is superior to using the extractjson() function when you need to extract more than one element of a compound JSON object.

以下查询从数组中提取 JSON 元素。The following query extracts the JSON elements from an array.

[ 单击以运行查询 ][Click to run query]

let MyData = datatable (Trace: string)
['{"duration":[{"value":118.0,"valcount":5.0,"min":100.0,"max":150.0,"stdDev":0.0}]}'];
MyData
| extend NewCol = parse_json(Trace)
| project NewCol.duration[0].value, NewCol.duration[0].valcount, NewCol.duration[0].min, NewCol.duration[0].max, NewCol.duration[0].stdDev

以下查询提取 JSON 元素。The following query extracts the JSON elements.

[ 单击以运行查询 ][Click to run query]

let MyData = datatable (Trace: string) ['{"value":118.0,"valcount":5.0,"min":100.0,"max":150.0,"stdDev":0.0}'];
MyData
| extend NewCol = parse_json(Trace)
| project NewCol.value, NewCol.valcount, NewCol.min, NewCol.max, NewCol.stdDev

以下查询提取带有动态数据类型的 JSON 元素。The following query extracts the JSON elements with a dynamic data type.

[ 单击以运行查询 ][Click to run query]

let MyData = datatable (Trace: dynamic)
[dynamic({"value":118.0,"counter":5.0,"min":100.0,"max":150.0,"stdDev":0.0})];
MyData
| project Trace.value, Trace.counter, Trace.min, Trace.max, Trace.stdDev

ago()ago()

ago() :从当前 UTC 时钟时间减去给定时间跨度。ago(): Subtracts the given timespan from the current UTC clock time.

以下查询返回最近 12 个小时的数据。The following query returns data for the last 12 hours.

[ 单击以运行查询 ][Click to run query]

//The first two lines generate sample data, and the last line uses
//the ago() operator to get records for last 12 hours.
print TimeStamp= range(now(-5d), now(), 1h), SomeCounter = range(1,121)
| mv-expand TimeStamp, SomeCounter
| where TimeStamp > ago(12h)

startofweek()startofweek()

startofweek() :返回包含日期的一周的起点,根据偏移量移动(如提供)startofweek(): Returns the start of the week containing the date, shifted by an offset, if provided

以下查询将返回带不同偏移量的一周的起点。The following query returns the start of the week with different offsets.

[ 单击以运行查询 ][Click to run query]

range offset from -1 to 1 step 1
| project weekStart = startofweek(now(), offset),offset

此查询使用 range 运算符,将生成单列值表。This query uses the range operator, which generates a single-column table of values. 另请参阅:startofday()startofweek()startofyear())、startofmonth()endofday() endofweek() endofmonth() endofyear() See also: startofday(), startofweek(), startofyear()), startofmonth(), endofday(), endofweek(), endofmonth(), and endofyear().

between()between()

between() :匹配包含范围内的输入。between(): Matches the input that is inside the inclusive range.

以下查询依据给定日期范围来筛选数据。The following query filters the data by a given date range.

[ 单击以运行查询 ][Click to run query]

StormEvents
| where StartTime between (datetime(2007-07-27) .. datetime(2007-07-30))
| count

以下查询根据给定日期范围对数据进行筛选,与起始日期存在三天 (3d) 的略微偏差。The following query filters the data by a given date range, with the slight variation of three days (3d) from the start date.

[ 单击以运行查询 ][Click to run query]

StormEvents
| where StartTime between (datetime(2007-07-27) .. 3d)
| count

表格运算符Tabular operators

Kusto 有多个表格运算符,本文的其他部分介绍了其中一些运算符。Kusto has many tabular operators, some of which are covered in other sections of this article. 此处我们将重点介绍 parse。Here we'll focus on parse.

parseparse

parse:计算字符串表达式并将其值分析为一个或多个计算列。parse: Evaluates a string expression and parses its value into one or more calculated columns. 有三种分析方法:简单分析(默认)、正则表达式分析和宽松分析。There are three ways to parse: simple (the default), regex, and relaxed.

以下查询使用默认的简单分析来分析跟踪并提取相关值。The following query parses a trace and extracts the relevant values, using a default of simple parsing. 表达式(也称为“StringConstant”)是常规字符串值,并且有严格的匹配要求:扩展列必须匹配所需的类型。The expression (referred to as StringConstant) is a regular string value and the match is strict: extended columns must match the required types.

[ 单击以运行查询 ][Click to run query]

let MyTrace = datatable (EventTrace:string)
[
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01Z, releaseTime=02/17/2016 08:40:01Z, previousLockTime=02/17/2016 08:39:01Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00Z, releaseTime=02/17/2016 08:40:00Z, previousLockTime=02/17/2016 08:39:00Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01Z, releaseTime=02/17/2016 08:40:01Z, previousLockTime=02/17/2016 08:39:01Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01Z, releaseTime=02/17/2016 08:41:00Z, previousLockTime=02/17/2016 08:40:01Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00Z, releaseTime=02/17/2016 08:41:00Z, previousLockTime=02/17/2016 08:40:00Z)'
];
MyTrace
| parse EventTrace with * "resourceName=" resourceName ", totalSlices=" totalSlices:long * "sliceNumber=" sliceNumber:long * "lockTime=" lockTime ", releaseTime=" releaseTime:date "," * "previousLockTime=" previouLockTime:date ")" *  
| project resourceName ,totalSlices , sliceNumber , lockTime , releaseTime , previouLockTime

以下查询使用 kind = regex 来分析跟踪并提取相关值。The following query parses a trace and extracts the relevant values, using kind = regex. StringConstant 可以是正则表达式。The StringConstant can be a regular expression.

[ 单击以运行查询 ][Click to run query]

let MyTrace = datatable (EventTrace:string)
[
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01Z, releaseTime=02/17/2016 08:40:01Z, previousLockTime=02/17/2016 08:39:01Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00Z, releaseTime=02/17/2016 08:40:00Z, previousLockTime=02/17/2016 08:39:00Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01Z, releaseTime=02/17/2016 08:40:01Z, previousLockTime=02/17/2016 08:39:01Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01Z, releaseTime=02/17/2016 08:41:00Z, previousLockTime=02/17/2016 08:40:01Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00Z, releaseTime=02/17/2016 08:41:00Z, previousLockTime=02/17/2016 08:40:00Z)'
];
MyTrace
| parse kind = regex EventTrace with "(.*?)[a-zA-Z]*=" resourceName @", totalSlices=\s*\d+\s*.*?sliceNumber=" sliceNumber:long  ".*?(previous)?lockTime=" lockTime ".*?releaseTime=" releaseTime ".*?previousLockTime=" previousLockTime:date "\\)"  
| project resourceName , sliceNumber , lockTime , releaseTime , previousLockTime

以下查询使用 kind = relaxed 来分析跟踪并提取相关值。The following query parses a trace and extracts the relevant values, using kind = relaxed. StringConstant 是常规字符串值,其匹配要求较为宽松:扩展列可部分匹配所需的类型。The StringConstant is a regular string value and the match is relaxed: extended columns can partially match the required types.

[ 单击以运行查询 ][Click to run query]

let MyTrace = datatable (EventTrace:string)
[
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01Z, releaseTime=02/17/2016 08:40:01Z, previousLockTime=02/17/2016 08:39:01Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00Z, releaseTime=02/17/2016 08:40:00Z, previousLockTime=02/17/2016 08:39:00Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01Z, releaseTime=02/17/2016 08:40:01Z, previousLockTime=02/17/2016 08:39:01Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01Z, releaseTime=02/17/2016 08:41:00Z, previousLockTime=02/17/2016 08:40:01Z)',
'Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00Z, releaseTime=02/17/2016 08:41:00Z, previousLockTime=02/17/2016 08:40:00Z)'
];
MyTrace
| parse kind=relaxed "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=NULL, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=NULL, previousLockTime=02/17/2016 08:39:01)" with * "resourceName=" resourceName ", totalSlices=" totalSlices:long * "sliceNumber=" sliceNumber:long * "lockTime=" lockTime ", releaseTime=" releaseTime:date "," * "previousLockTime=" previousLockTime:date ")" *  
| project resourceName ,totalSlices , sliceNumber , lockTime , releaseTime , previousLockTime

时序分析Time series analysis

make-seriesmake-series

make-series:将 summarize 之类的行组聚合在一起,但每个 by 值组合会生成一个(时间)序列向量。make-series: aggregates together groups of rows like summarize, but generates a (time) series vector per each combination of by values.

以下查询将返回一组有关每日风暴事件计数的时间序列。The following query returns a set of time series for the count of storm events per day. 该查询涵盖了每个州三个月的时段,用常数 0 填充缺少的量化:The query covers a three-month period for each state, filling missing bins with the constant 0:

[ 单击以运行查询 ][Click to run query]

StormEvents
| make-series n=count() default=0 on StartTime in range(datetime(2007-01-01), datetime(2007-03-31), 1d) by State

在创建一组(时间)序列后,即可应用序列函数来检测异常形状、季节性模式等等。Once you create a set of (time) series, you can apply series functions to detect anomalous shapes, seasonal patterns, and a lot more.

以下查询提取具体某天事件最多的前三个州:The following query extracts the top three states that had the most events in specific day:

[ 单击以运行查询 ][Click to run query]

StormEvents
| make-series n=count() default=0 on StartTime in range(datetime(2007-01-01), datetime(2007-03-31), 1d) by State
| extend series_stats(n)
| top 3 by series_stats_n_max desc
| render timechart

有关详细信息,请查看完整的序列函数列表。For more information, review the full list of series functions.

高级聚合Advanced aggregations

本文前面部分介绍了基本聚合,如 count 和 summarize。We covered basic aggregations, like count and summarize, earlier in this article. 本部分将介绍更高级的选项。This section introduces more advanced options.

top-nestedtop-nested

top-nested:生成分层式最佳结果,其中每个级别都是对上一级别值的深化。top-nested: Produces hierarchical top results, where each level is a drill-down based on previous level values.

此操作符对于仪表板可视化场景或者需要回答以下问题时非常有用:“找出 K1 的前 N 个值(使用一些聚合);对于上述每一个值,找出 K2 的前 M 个值(使用另一个聚合);……”This operator is useful for dashboard visualization scenarios, or when it is necessary to answer a question like the following: "Find the top-N values of K1 (using some aggregation); for each of them, find what are the top-M values of K2 (using another aggregation); ..."

以下查询将返回一个分层式的表,最高级别为 State,后跟 SourcesThe following query returns a hierarchical table with State at the top level, followed by Sources.

[ 单击以运行查询 ][Click to run query]

StormEvents
| top-nested 2 of State by sum(BeginLat),
top-nested 3 of Source by sum(BeginLat),
top-nested 1 of EndLocation by sum(BeginLat)

pivot() 插件pivot() plugin

pivot() plugin:通过将输入表某列中的唯一值转换成输出表中的多个列来旋转表。pivot() plugin: Rotates a table by turning the unique values from one column in the input table into multiple columns in the output table. 该运算符将在最终输出中的任何剩余列值上执行聚合。The operator performs aggregations where they are required on any remaining column values in the final output.

以下查询应用筛选器,并将行旋转成列。The following query applies a filter and pivots the rows into columns.

[ 单击以运行查询 ][Click to run query]

StormEvents
| project State, EventType
| where State startswith "AL"
| where EventType has "Wind"
| evaluate pivot(State)

dcount()dcount()

dcount() :返回对组中某个表达式的非重复值数的估计值。dcount(): Returns an estimate of the number of distinct values of an expression in the group. 使用 count() 来计数所有值。Use count() to count all values.

以下查询根据 State 来计数非重复 SourceThe following query counts distinct Source by State.

[ 单击以运行查询 ][Click to run query]

StormEvents
| summarize Sources = dcount(Source) by State

dcountif()dcountif()

dcountif() :返回谓词计算结果为 true 的行表达式非重复值数量的估计值。dcountif(): Returns an estimate of the number of distinct values of the expression for rows for which the predicate evaluates to true.

以下查询将计数其中 DamageProperty < 5000Source 的非重复值。The following query counts the distinct values of Source where DamageProperty < 5000.

[ 单击以运行查询 ][Click to run query]

StormEvents 
| take 100
| summarize Sources = dcountif(Source, DamageProperty < 5000) by State

dcount_hll()dcount_hll()

dcount_hll() :从 HyperLogLog 结果计算 dcount(由 hll or hll_merge 生成)。dcount_hll(): Calculates the dcount from HyperLogLog results (generated by hll or hll_merge.

以下查询使用 HLL 算法来生成计数。The following query uses the HLL algorithm to generate the count.

[ 单击以运行查询 ][Click to run query]

StormEvents
| summarize hllRes = hll(DamageProperty) by bin(StartTime,10m)
| summarize hllMerged = hll_merge(hllRes)
| project dcount_hll(hllMerged)

arg_max()arg_max()

arg_max() :在最大化表达式的组中查找行,并返回另一个表达式的值(或使用 * 返回整个行)。arg_max(): Finds a row in the group that maximizes an expression, and returns the value of another expression (or * to return the entire row).

以下查询返回每个州上次洪水报告的时间。The following query returns the time of the last flood report in each state.

[ 单击以运行查询 ][Click to run query]

StormEvents
| where EventType == "Flood"
| summarize arg_max(StartTime, *) by State
| project State, StartTime, EndTime, EventType

makeset()makeset()

makeset() :返回表达式在组中所获取非重复值集的动态 (JSON) 数组。makeset(): Returns a dynamic (JSON) array of the set of distinct values that an expression takes in the group.

以下查询返回每个州报告洪水的所有时间,并从非重复值集中创建一个数组。The following query returns all the times when a flood was reported by each state and creates an array from the set of distinct values.

[ 单击以运行查询 ][Click to run query]

StormEvents
| where EventType == "Flood"
| summarize FloodReports = makeset(StartTime) by State
| project State, FloodReports

mv-expandmv-expand

mv-expand:从动态类型的列扩展多值集合,以便集合中的每个值都获得一个单独的行。mv-expand: Expands multi-value collection(s) from a dynamic-typed column so that each value in the collection gets a separate row. 将复制扩展行中的所有其他列。All the other columns in an expanded row are duplicated. 它与 makelist 相反。It's the opposite of makelist.

以下查询通过创建一组数据,然后用其演示 mv-expand 功能来生成示例数据。The following query generates sample data by creating a set and then using it to demonstrate the mv-expand capabilities.

[ 单击以运行查询 ][Click to run query]

let FloodDataSet = StormEvents
| where EventType == "Flood"
| summarize FloodReports = makeset(StartTime) by State
| project State, FloodReports;
FloodDataSet
| mv-expand FloodReports

percentiles()percentiles()

percentiles() :返回表达式定义的人口中指定最近排名百分位数的估计值。percentiles(): Returns an estimate for the specified nearest-rank percentile of the population defined by an expression. 准确性取决于百分位区域中的填充密度。The accuracy depends on the density of population in the region of the percentile. 只能在 summarize 内的聚合上下文中使用。Can be used only in the context of aggregation inside summarize.

以下查询计算风暴持续时间的百分位数。The following query calculates percentiles for storm duration.

[ 单击以运行查询 ][Click to run query]

StormEvents
| extend duration = EndTime - StartTime
| where duration > 0s
| where duration < 3h
| summarize percentiles(duration, 5, 20, 50, 80, 95)

以下查询按州计算风暴持续时间的百分位数,并按照 5 分钟量化 (5m) 来规范化数据。The following query calculates percentiles for storm duration by state and normalizes the data by five-minute bins (5m).

[ 单击以运行查询 ][Click to run query]

StormEvents
| extend duration = EndTime - StartTime
| where duration > 0s
| where duration < 3h
| summarize event_count = count() by bin(duration, 5m), State
| summarize percentiles(duration, 5, 20, 50, 80, 95) by State

跨数据集Cross Dataset

本部分介绍的一些元素,可让你创建更复杂的查询,跨表联接数据,以及跨数据库和群集查询。This section covers elements that enable you to create more complex queries, join data across tables, and query across databases and clusters.

letlet

let:改进模块化和重复使用。let: Improves modularity and reuse. let 语句可以将潜在的复杂表达式拆分为多个部分,每个部分绑定到一个名称,然后将这些部分组合在一起。The let statement allows you to break a potentially complex expression into multiple parts, each bound to a name, and compose those parts together. 此外,let 语句还可用于创建用户定义的函数和视图(表的结果类似于新表的表达式)。A let statement can also be used to create user-defined functions and views (expressions over tables whose results look like a new table). 通过 let 语句绑定的表达式可以是标量类型、表格类型或用户定义的函数 (lambda)。Expressions bound by a let statement can be of scalar type, of tabular type, or user-defined function (lambdas).

以下示例创建一个表格类型变量,并在后续表达式中使用。The following example creates a tabular type variable and uses it in a subsequent expression.

[ 单击以运行查询 ][Click to run query]

let LightningStorms =
StormEvents
| where EventType == "Lightning";
let AvalancheStorms =
StormEvents
| where EventType == "Avalanche";
LightningStorms
| join (AvalancheStorms) on State
| distinct State

joinjoin

join:通过匹配每个表中指定列的值,合并两个表的行以组成新表。join: Merge the rows of two tables to form a new table by matching values of the specified column(s) from each table. Kusto 支持一系列完整的联接类型:fullouter、inner、innerunique、leftanti、leftantisemi、leftouter、leftsemi、rightanti、rightantisemi、rightouter 、rightsemi 。Kusto supports a full range of join types: fullouter, inner, innerunique, leftanti, leftantisemi, leftouter, leftsemi, rightanti, rightantisemi, rightouter, rightsemi.

以下示例使用内部联接来联接两个表。The following example joins two tables with an inner join.

[ 单击以运行查询 ][Click to run query]

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X 
| join kind=inner Y on Key

提示

在联接前,使用 where 和 project 运算符可以减少输入表中的行数和列数。Use where and project operators to reduce the numbers of rows and columns in the input tables, before the join. 如果某个表始终小于另一个表,则将其用作 join 的左侧(通过管道传送)。If one table is always smaller than the other, use it as the left (piped) side of the join. join 匹配的列必须具有相同名称。The columns for the join match must have the same name. 如有必要,请使用 project 运算符重命名其中一个表中的一列。Use the project operator if necessary to rename a column in one of the tables.

serializeserialize

serialize:序列化行集,以便可以使用需要序列化数据的函数,如 row_number()。serialize: Serializes the row set so you can use functions that require serialized data, like row_number().

以下查询将成功执行,因为已序列化数据。The following query succeeds because the data is serialized.

[ 单击以运行查询 ][Click to run query]

StormEvents
| summarize count() by State
| serialize
| extend row_number = row_number()

如果行集是以下运算的结果,则同样视为已序列化:sort、top 或 range 运算符,后可跟 project、project-away、extend、where、parse、mv-expand 或 take 运算符。The row set is also considered as serialized if it's a result of: sort, top, or range operators, optionally followed by project, project-away, extend, where, parse, mv-expand, or take operators.

[ 单击以运行查询 ][Click to run query]

StormEvents
| summarize count() by State
| sort by State asc
| extend row_number = row_number()

跨数据库和跨群集查询Cross-database and cross-cluster queries

跨数据库和跨群集查询:可以通过将同一群集引用为 database("MyDatabase").MyTable 来查询其中的数据库。Cross-database and cross-cluster queries: You can query a database on the same cluster by referring it as database("MyDatabase").MyTable. 你可以通过将远程群集引用为 cluster("MyCluster").database("MyDatabase").MyTable 来查询其中的数据库。You can query a database on a remote cluster by referring to it as cluster("MyCluster").database("MyDatabase").MyTable.

以下查询从一个群集中调用,并查询 MyCluster 群集中的数据。The following query is called from one cluster and queries data from MyCluster cluster. 若要运行此查询,使用你自己的群集名称和数据库名称。To run this query, use your own cluster name and database name.

cluster("MyCluster").database("Wiki").PageViews
| where Views < 2000
| take 1000;

用户分析User Analytics

本部分包含了一些元素和查询,用于演示在 Kusto 中执行用户行为分析的容易程度。This section includes elements and queries that demonstrate how easy it is to perform analysis of user behaviors in Kusto.

activity_counts_metrics 插件activity_counts_metrics plugin

activity_counts_metrics plugin:计算有用的活动指标(总计数值、非重复计数值、新值的非重复计数和聚合非重复计数)。activity_counts_metrics plugin: Calculates useful activity metrics (total count values, distinct count values, distinct count of new values, and aggregated distinct count). 计算每个时间窗口的指标,然后将它们进行比较,并与之前的所有时间窗口进行聚合。Metrics are calculated for each time window, then they are compared, and aggregated to and with all previous time windows.

以下查询通过计算每日活动计数来分析用户采用。The following query analyzes user adoption by calculating daily activity counts.

[ 单击以运行查询 ][Click to run query]

let start=datetime(2017-08-01);
let end=datetime(2017-08-04);
let window=1d;
let T = datatable(UserId:string, Timestamp:datetime)
[
'A', datetime(2017-08-01),
'D', datetime(2017-08-01),
'J', datetime(2017-08-01),
'B', datetime(2017-08-01),
'C', datetime(2017-08-02),
'T', datetime(2017-08-02),
'J', datetime(2017-08-02),
'H', datetime(2017-08-03),
'T', datetime(2017-08-03),
'T', datetime(2017-08-03),
'J', datetime(2017-08-03),
'B', datetime(2017-08-03),
'S', datetime(2017-08-03),
'S', datetime(2017-08-04),
];
T
| evaluate activity_counts_metrics(UserId, Timestamp, start, end,
window)

activity_engagement 插件activity_engagement plugin

activity_engagement plugin:在滑动时间线窗口中,根据 ID 列计算活动参与比率。activity_engagement plugin: Calculates activity engagement ratio based on ID column over a sliding timeline window. activity_engagement 插件可用于计算 DAU、WAU 和 MAU(每日、每周和每月的活动用户)。activity_engagement plugin can be used for calculating DAU, WAU, and MAU (daily, weekly, and monthly active users).

以下查询在一个移动的七天窗口中,返回使用每日应用程序的不同用户总数,与使用每周应用程序的不同用户总数之间的比率。The following query returns the ratio of total distinct users using an application daily compared to total distinct users using the application weekly, on a moving seven-day window.

[ 单击以运行查询 ][Click to run query]

// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-01-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
// Calculate DAU/WAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 7d)
| project _day, Dau_Wau=activity_ratio*100
| render timechart

提示

在计算 DAU/MAU 时,更改结束数据和移动窗口期 (OuterActivityWindow)。When calculating DAU/MAU, change the end data and the moving window period (OuterActivityWindow).

activity_metrics 插件activity_metrics plugin

activity_metrics plugin:根据当前阶段窗口和前一阶段窗口计算有用的活动指标(非重复计数值、新值的不同计数、保留率和改动率)。activity_metrics plugin: Calculates useful activity metrics (distinct count values, distinct count of new values, retention rate, and churn rate) based on the current period window vs. the previous period window.

以下查询计算给定数据集的改动率和保留率。The following query calculates the churn and retention rate for a given dataset.

[ 单击以运行查询 ][Click to run query]

// Generate random data of user activities
let _start = datetime(2017-01-02);
let _end = datetime(2017-05-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
| where _day > datetime(2017-01-02)
| project _day, id
// Calculate weekly retention rate
| evaluate activity_metrics(['id'], _day, _start, _end, 7d)
| project _day, retention_rate*100, churn_rate*100
| render timechart

new_activity_metrics 插件new_activity_metrics plugin

new_activity_metrics plugin:为新用户队列计算有用的活动指标(非重复计数值、非重复计数的新值、保留率和改动率)。new_activity_metrics plugin: Calculates useful activity metrics (distinct count values, distinct count of new values, retention rate, and churn rate) for the cohort of new users. 此插件的概念类似于 activity_metrics 插件,但关注的是新用户。The concept of this plugin is similar to activity_metrics plugin, but focuses on new users.

以下查询计算新用户队列(第一周到达的用户)在周间窗口的保留率和改动率。The following query calculates a retention and churn rate with a week-over-week window for the new users cohort (users that arrived on the first week).

[ 单击以运行查询 ][Click to run query]

// Generate random data of user activities
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
range Day from _start to _end step 1d
| extend d = tolong((Day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
// Take only the first week cohort (last parameter)
| evaluate new_activity_metrics(['id'], Day, _start, _end, 7d, _start)
| project from_Day, to_Day, retention_rate, churn_rate

session_count 插件session_count plugin

session_count plugin:基于某个时间线的 ID 列计算会话计数。session_count plugin: Calculates the count of sessions based on ID column over a timeline.

以下查询返回会话计数。The following query returns the count of sessions. 如果用户 ID 在 100 个时间段的时间范围内出现至少一次,而会话回查窗口为 41 个时段,会话将被视为处于活动状态。A session is considered active if a user ID appears at least once at a timeframe of 100-time slots, while the session look-back window is 41-time slots.

[ 单击以运行查询 ][Click to run query]

let _data = range Timeline from 1 to 9999 step 1
| extend __key = 1
| join kind=inner (range Id from 1 to 50 step 1 | extend __key=1) on __key
| where Timeline % Id == 0
| project Timeline, Id;
// End of data definition
_data
| evaluate session_count(Id, Timeline, 1, 10000, 100, 41)
| render linechart

funnel_sequence 插件funnel_sequence plugin

funnel_sequence plugin:计算已获取州序列的用户的非重复计数;显示序列产生或遵循的前一个和下一个州的分布。funnel_sequence plugin: Calculates the distinct count of users who have taken a sequence of states; shows the distribution of previous and next states that have led to or were followed by the sequence.

以下查询显示在 2007 年的所有龙卷风事件之前和之后发生了哪些事件。The following query shows what event happens before and after all Tornado events in 2007.

[ 单击以运行查询 ][Click to run query]

// Looking on StormEvents statistics:
// Q1: What happens before Tornado event?
// Q2: What happens after Tornado event?
StormEvents
| evaluate funnel_sequence(EpisodeId, StartTime, datetime(2007-01-01), datetime(2008-01-01), 1d,365d, EventType, dynamic(['Tornado']))

funnel_sequence_completion 插件funnel_sequence_completion plugin

funnel_sequence_completion plugin:计算不同时间段内已完成序列步骤的漏斗图。funnel_sequence_completion plugin: Calculates the funnel of completed sequence steps within different time periods.

以下查询检查序列中的完成漏斗图:在一小时、四小时和一天 ([1h, 4h, 1d]) 的“总体”时间中的 Hail -> Tornado -> Thunderstorm -> WindThe following query checks the completion funnel of the sequence: Hail -> Tornado -> Thunderstorm -> Wind in "overall" times of one hour, four hours, and one day ([1h, 4h, 1d]).

[ 单击以运行查询 ][Click to run query]

let _start = datetime(2007-01-01);
let _end = datetime(2008-01-01);
let _windowSize = 365d;
let _sequence = dynamic(['Hail', 'Tornado', 'Thunderstorm', 'Wind']);
let _periods = dynamic([1h, 4h, 1d]);
StormEvents
| evaluate funnel_sequence_completion(EpisodeId, StartTime, _start, _end, _windowSize, EventType, _sequence, _periods)

函数Functions

本部分介绍函数:存储在服务器上的可重用查询。This section covers functions: reusable queries that are stored on the server. 函数可由查询和其他函数调用(不支持递归函数)。Functions can be invoked by queries and other functions (recursive functions are not supported).

备注

你不能在只读的帮助群集上创建函数。You cannot create functions on the help cluster, which is read-only. 在此部分使用你自己的测试群集。Use your own test cluster for this part.

以下示例创建一个函数,采用州名称 (MyState) 作为参数。The following example creates a function that takes a state name (MyState) as an argument.

.create function with (folder="Demo")
MyFunction (MyState: string)
{
StormEvents
| where State =~ MyState
}

以下示例调用一个函数,该函数获取德克萨斯州的数据。The following example calls a function, which gets data for the state of Texas.

MyFunction ("Texas")
| summarize count()

以下示例删除在第一步中创建的函数。The following example deletes the function that was created in the first step.

.drop function MyFunction

后续步骤Next steps

Kusto 查询语言参考Kusto Query Language reference