Advanced aggregations in Azure Monitor log queries

Note

You should complete Aggregations in Azure Monitor queries before completing this lesson.

Note

You can work through this exercise in your own Log Analytics environment, or you can use our Demo environment, which includes plenty of sample data.

This article describes some of the more advanced aggregation options available to Azure Monitor queries.

Generating lists and sets

You can use makelist to pivot data by the order of values in a particular column. For example, you may want to explore the most common order events take place on your machines. You can essentially pivot the data by the order of EventIDs on each machine.

Event
| where TimeGenerated > ago(12h)
| order by TimeGenerated desc
| summarize makelist(EventID) by Computer
Computer list_EventID
computer1 [704,701,1501,1500,1085,704,704,701]
computer2 [326,105,302,301,300,102]
... ...

makelist generates a list in the order that data was passed into it. To sort events from oldest to newest, use asc in the order statement instead of desc.

It is also useful to create a list of just distinct values. This is called a Set and can be generated with makeset:

Event
| where TimeGenerated > ago(12h)
| order by TimeGenerated desc
| summarize makeset(EventID) by Computer
Computer list_EventID
computer1 [704,701,1501,1500,1085]
computer2 [326,105,302,301,300,102]
... ...

Like makelist, makeset also works with ordered data and will generate the arrays based on the order of the rows that are passed into it.

Expanding lists

The inverse operation of makelist or makeset is mvexpand, which expands a list of values to separate rows. It can expand across any number of dynamic columns, both JSON and array. For example, you could check the Heartbeat table for solutions sending data from computers that sent a heartbeat in the last hour:

Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, Solutions
Computer Solutions
computer1 "security", "updates", "changeTracking"
computer2 "security", "updates"
computer3 "antiMalware", "changeTracking"
... ...

Use mvexpand to show each value in a separate row instead of a comma-separated list:

Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, split(Solutions, ",")
| mvexpand Solutions
Computer Solutions
computer1 "security"
computer1 "updates"
computer1 "changeTracking"
computer2 "security"
computer2 "updates"
computer3 "antiMalware"
computer3 "changeTracking"
... ...

You could then use makelist again to group items together, and this time see the list of computers per solution:

Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, split(Solutions, ",")
| mvexpand Solutions
| summarize makelist(Computer) by tostring(Solutions) 
Solutions list_Computer
"security" ["computer1", "computer2"]
"updates" ["computer1", "computer2"]
"changeTracking" ["computer1", "computer3"]
"antiMalware" ["computer3"]
... ...

Handling missing bins

A useful application of mvexpand is the need to fill default values in for missing bins. For example, suppose you're looking for the uptime of a particular machine by exploring its heartbeat. You also want to see the source of the heartbeat which is in the category column. Normally, we would use a simple summarize statement as follows:

Heartbeat
| where TimeGenerated > ago(12h)
| summarize count() by Category, bin(TimeGenerated, 1h)
Category TimeGenerated count_
Direct Agent 2017-06-06T17:00:00Z 15
Direct Agent 2017-06-06T18:00:00Z 60
Direct Agent 2017-06-06T20:00:00Z 55
Direct Agent 2017-06-06T21:00:00Z 57
Direct Agent 2017-06-06T22:00:00Z 60
... ... ...

In these results though the bucket associated with "2017-06-06T19:00:00Z" is missing because there isn't any heartbeat data for that hour. Use the make-series function to assign a default value to empty buckets. This will generate a row for each category with two extra array columns, one for values, and one for matching time buckets:

Heartbeat
| make-series count() default=0 on TimeGenerated in range(ago(1d), now(), 1h) by Category 
Category count_ TimeGenerated
Direct Agent [15,60,0,55,60,57,60,...] ["2017-06-06T17:00:00.0000000Z","2017-06-06T18:00:00.0000000Z","2017-06-06T19:00:00.0000000Z","2017-06-06T20:00:00.0000000Z","2017-06-06T21:00:00.0000000Z",...]
... ... ...

The third element of the count_ array is a 0 as expected, and there is a matching timestamp of "2017-06-06T19:00:00.0000000Z" in the TimeGenerated array. This array format is difficult to read though. Use mvexpand to expand the arrays and produce the same format output as generated by summarize:

Heartbeat
| make-series count() default=0 on TimeGenerated in range(ago(1d), now(), 1h) by Category 
| mvexpand TimeGenerated, count_
| project Category, TimeGenerated, count_
Category TimeGenerated count_
Direct Agent 2017-06-06T17:00:00Z 15
Direct Agent 2017-06-06T18:00:00Z 60
Direct Agent 2017-06-06T19:00:00Z 0
Direct Agent 2017-06-06T20:00:00Z 55
Direct Agent 2017-06-06T21:00:00Z 57
Direct Agent 2017-06-06T22:00:00Z 60
... ... ...

Narrowing results to a set of elements: let, makeset, toscalar, in

A common scenario is to select the names of some specific entities based on a set of criteria and then filter a different data set down to that set of entities. For example you might find computers that are known to have missing updates and identify IPs that these computers called out to:

let ComputersNeedingUpdate = toscalar(
    Update
    | summarize makeset(Computer)
    | project set_Computer
);
WindowsFirewall
| where Computer in (ComputersNeedingUpdate)

Next steps

See other lessons for using the Kusto query language with Azure Monitor log data: