top-nested operator

Article
01/08/2024

The top-nested operator performs hierarchical aggregation and value selection.

Imagine you have a table with sales information like regions, salespeople, and amounts sold. The top-nested operator can help you answer complex questions, such as "What are the top five regions by sales, and who are the top three salespeople in each of those regions?"

The source data is partitioned based on the criteria set in the first top-nested clause, such as region. Next, the operator picks the top records in each partition using an aggregation, such as adding sales amounts. Each subsequent top-nested clause refines the partitions created by the previous clause, creating a hierarchy of more precise groups.

The result is a table with two columns per clause. One column holds the partitioning values, such as region, while the other column holds the outcomes of the aggregation calculation, like the sum of sales.

Syntax

T | top-nested [ N ] of Expr [with others = ConstExpr] by Aggregation [asc | desc] [,
top-nested ... ]

Learn more about syntax conventions.

Parameters

Name	Type	Required	Description
T	`string`	✔️	The input tabular expression.
N	`int`		The number of top values to be returned for this hierarchy level. If omitted, all distinct values are returned.
Expr	`string`	✔️	An expression over the input record indicating which value to return for this hierarchy level. Typically, it refers to a column from T or involves a calculation like bin() on a column. Optionally, set an output column name as Name `=` Expr.
ConstExpr	`string`		If specified, for each hierarchy level, one record is added with the value that is the aggregation over all records that didn't make it to the top.
Aggregation	`string`		The aggregation function applied to records with the same Expr value. The result determines the top records. See Supported aggregation functions. Optionally, set an output column name as Name `=` Aggregation.

Supported aggregation functions

The following aggregation functions are supported:

Note

Any algebraic combination of the aggregations is also supported.

Returns

A table with two columns for each clause. One column contains unique values computed using Expr, and the other column shows the results obtained from the Aggregation calculation.

Include data from other columns

Only columns specified as a top-nested clause Expr are displayed in the output table.

To include all values of a column at a specific level:

Don't specify the value of N.
Use the column name as the value of Expr.
Use Ignore=max(1) as the value of Aggregation.
Remove the unnecessary Ignore column with project-away.

For an example, see Get the most recent events per state with additional data from other columns.

Performance considerations

The number of records may grow exponentially with the number of top-nested clauses, and record growth is even faster if no N limit is specified. This operator may consume a considerable amount of resources.

If the distribution of the aggregation is considerably non-uniform, limit the number of distinct values to return by specifying N. Then, use the with others = ConstExpr specification to get an indication for the weight of all other cases.

Examples

Get started with the `top-nested` operator

The following query partitions the StormEvents table by the State column and calculates the total latitude for each state. The query selects the top two states with the highest latitude sum. Within these top two states, the query groups the data by Source and selects the top three sources with the highest latitude sum. For each of the top three sources in the top two states, the query groups the data by EndLocation and selects the EndLocation with the highest latitude sum.

Run the query

StormEvents                                        // Data source.
| top-nested 2 of State       by sum(BeginLat),    // Top 2 States by total latitude.
  top-nested 3 of Source      by sum(BeginLat),    // Top 3 Sources by total latitude in each State.
  top-nested 1 of EndLocation by sum(BeginLat)     // Top 1 EndLocation by total latitude in each Source and State.

Output

State	aggregated_State	Source	aggregated_Source	EndLocation	aggregated_EndLocation
KANSAS	87771.2355000001	Law Enforcement	18744.823	FT SCOTT	264.858
KANSAS	87771.2355000001	Public	22855.6206	BUCKLIN	488.2457
KANSAS	87771.2355000001	Trained Spotter	21279.7083	SHARON SPGS	388.7404
TEXAS	123400.5101	Public	13650.9079	AMARILLO	246.2598
TEXAS	123400.5101	Law Enforcement	37228.5966	PERRYTON	289.3178
TEXAS	123400.5101	Trained Spotter	13997.7124	CLAUDE	421.44

Enhance top-nested results with data from another column

The following query builds upon the previous example by introducing an extra top-nested clause. In this new clause, the absence of a numeric specification results in the extraction of all distinct values of EventType across the partitions. The max(1) aggregation function is merely a placeholder, rendering its outcome irrelevant, so the project-away operator removes the Ignore column. The result shows all event types associated with the previously aggregated data.

Run the query

StormEvents
| top-nested 2 of State       by sum(BeginLat),
  top-nested 3 of Source      by sum(BeginLat),
  top-nested 1 of EndLocation by sum(BeginLat),
  top-nested   of EventType   by Ignore = max(1)
| project-away Ignore

Output

State	aggregated_State	Source	aggregated_Source	EndLocation	aggregated_EndLocation	EventType
TEXAS	123400.51009999994	Public	13650.907900000002	AMARILLO	246.25979999999998	Hail
TEXAS	123400.51009999994	Public	13650.907900000002	AMARILLO	246.25979999999998	Thunderstorm Wind
KANSAS	87771.235500000068	Public	22855.6206	BUCKLIN	488.2457	Flood
KANSAS	87771.235500000068	Public	22855.6206	BUCKLIN	488.2457	Thunderstorm Wind
KANSAS	87771.235500000068	Public	22855.6206	BUCKLIN	488.2457	Hail
TEXAS	123400.51009999994	Trained Spotter	13997.712400000009	CLAUDE	421.44	Hail
KANSAS	87771.235500000068	Law Enforcement	18744.823000000004	FT SCOTT	264.858	Flash Flood
KANSAS	87771.235500000068	Law Enforcement	18744.823000000004	FT SCOTT	264.858	Thunderstorm Wind
KANSAS	87771.235500000068	Law Enforcement	18744.823000000004	FT SCOTT	264.858	Flood
TEXAS	123400.51009999994	Law Enforcement	37228.596599999961	PERRYTON	289.3178	Hail
...	...	...	...	...	...

Use `with` `others` to explore excluded data

When incorporated within a top-nested clause, the with others specification introduces an extra record that aggregates data excluded from the top results. In the following query, an extra record is created in the State and aggregated_State columns, representing the collective latitude of all states except Kansas and Texas. Moreover, the EndLocation and aggregated_EndLocation column have an extra nine records. These records show the combined latitude of end locations not qualifying as the top location within each state and source.

Run the query

StormEvents
| top-nested 2 of State with others = "All Other States" by sum(BeginLat),
  top-nested 3 of Source by sum(BeginLat),
  top-nested 1 of EndLocation with others = "All Other End Locations" by sum(BeginLat)

Output

State	aggregated_State	Source	aggregated_Source	EndLocation	aggregated_EndLocation
KANSAS	87771.2355000001	Law Enforcement	18744.823	FT SCOTT	264.858
KANSAS	87771.2355000001	Public	22855.6206	BUCKLIN	488.2457
KANSAS	87771.2355000001	Trained Spotter	21279.7083	SHARON SPGS	388.7404
TEXAS	123400.5101	Public	13650.9079	AMARILLO	246.2598
TEXAS	123400.5101	Law Enforcement	37228.5966	PERRYTON	289.3178
TEXAS	123400.5101	Trained Spotter	13997.7124	CLAUDE	421.44
KANSAS	87771.2355000001	Law Enforcement	18744.823	All Other End Locations	18479.965
KANSAS	87771.2355000001	Public	22855.6206	All Other End Locations	22367.3749
KANSAS	87771.2355000001	Trained Spotter	21279.7083	All Other End Locations	20890.9679
TEXAS	123400.5101	Public	13650.9079	All Other End Locations	13404.6481
TEXAS	123400.5101	Law Enforcement	37228.5966	All Other End Locations	36939.2788
TEXAS	123400.5101	Trained Spotter	13997.7124	All Other End Locations	13576.2724
KANSAS	87771.2355000001			All Other End Locations	24891.0836
TEXAS	123400.5101			All Other End Locations	58523.2932000001
All Other States	1149279.5923			All Other End Locations	1149279.5923

The following query shows the same results for the first level used in the previous example.

Run the query

StormEvents
| where State !in ('TEXAS', 'KANSAS')
| summarize sum(BeginLat)

Output

sum_BeginLat
1149279.5923

Sort hierarchical results

To achieve a comprehensive sort order, the following query uses index-based sorting for each value within the current hierarchy level, per group. This sorting is geared towards arranging the result according to the ultimate nested level, in this case the EndLocation.

Run the query

StormEvents
| top-nested 2 of State by sum(BeginLat),
  top-nested 2 of Source by sum(BeginLat),
  top-nested 4 of EndLocation by sum(BeginLat)
| sort by State, Source, aggregated_EndLocation
| summarize
    EndLocations = make_list(EndLocation, 10000),
    endLocationSums = make_list(aggregated_EndLocation, 10000)
    by State, Source
| extend indicies = range(0, array_length(EndLocations) - 1, 1)
| mv-expand EndLocations, endLocationSums, indicies

Output

State	Source	EndLocations	endLocationSums	indices
TEXAS	Trained Spotter	CLAUDE	421.44	0
TEXAS	Trained Spotter	AMARILLO	316.8892	1
TEXAS	Trained Spotter	DALHART	252.6186	2
TEXAS	Trained Spotter	PERRYTON	216.7826	3
TEXAS	Law Enforcement	PERRYTON	289.3178	0
TEXAS	Law Enforcement	LEAKEY	267.9825	1
TEXAS	Law Enforcement	BRACKETTVILLE	264.3483	2
TEXAS	Law Enforcement	GILMER	261.9068	3
KANSAS	Trained Spotter	SHARON SPGS	388.7404	0
KANSAS	Trained Spotter	ATWOOD	358.6136	1
KANSAS	Trained Spotter	LENORA	317.0718	2
KANSAS	Trained Spotter	SCOTT CITY	307.84	3
KANSAS	Public	BUCKLIN	488.2457	0
KANSAS	Public	ASHLAND	446.4218	1
KANSAS	Public	PROTECTION	446.11	2
KANSAS	Public	MEADE STATE PARK	371.1	3

Get the most recent events per state with additional data from other columns

The following query demonstrates how to retrieve the two most recent events for each US state along with relevant event details. Notice the use of max(1) within certain columns, identified by Ignore*, which aids in propagating data through the query without imposing any selection logic.

Run the query

StormEvents
| top-nested of State by Ignore0=max(1),                  // Partition the data by each unique value of state.
  top-nested 2 of StartTime by Ignore1=max(StartTime),    // Get the 2 most recent events in each state.
  top-nested of EndTime by Ignore2=max(1),                // Append the EndTime for each event.
  top-nested of EpisodeId by Ignore3=max(1)               // Append the EpisodeId for each event.
| project-away Ignore*                                    // Remove the unnecessary aggregation columns.
| order by State asc, StartTime desc                      // Sort results alphabetically and chronologically.

Get the latest records per identity with additional data from other columns

The following query showcases how to extract the latest records per identity and builds on the concepts introduced in the previous example. The first top-nested clause partitions the data by distinct values of id. The subsequent clause identifies the two most recent records based on the timestamp for each id. Other information is appended using a top-nested operator alongside an unspecified count and the arbitrary max(1) aggregation. Finally, unnecessary aggregation columns are removed using the project-away operator.

Run the query

datatable(id: string, timestamp: datetime, otherInformation: string) // Create a source datatable.
[
    "Barak", datetime(2015-01-01), "1",
    "Barak", datetime(2016-01-01), "2",
    "Barak", datetime(2017-01-20), "3",
    "Donald", datetime(2017-01-20), "4",
    "Donald", datetime(2017-01-18), "5",
    "Donald", datetime(2017-01-19), "6"
]
| top-nested of id by Ignore0=max(1),                     // Partition the data by each unique value of id.
  top-nested 2 of timestamp by Ignore1=max(timestamp),    // Get the 2 most recent events for each state.
  top-nested of otherInformation by Ignore2=max(1)        // Append otherInformation for each event.
| project-away Ignore0, Ignore1, Ignore2                  // Remove the unnecessary aggregation columns.

Output

id	timestamp	otherInformation
Barak	2016-01-01T00:00:00Z	2
Donald	2017-01-19T00:00:00Z	6
Barak	2017-01-20T00:00:00Z	3
Donald	2017-01-20T00:00:00Z	4

partition operator

top-nested operator

Syntax

Parameters

Supported aggregation functions

Returns

Include data from other columns

Performance considerations

Examples

Get started with the `top-nested` operator

Enhance top-nested results with data from another column

Use `with` `others` to explore excluded data

Sort hierarchical results

Get the most recent events per state with additional data from other columns

Get the latest records per identity with additional data from other columns

Feedback

Feedback

Additional resources

top-nested operator

Syntax

Parameters

Supported aggregation functions

Returns

Include data from other columns

Performance considerations

Examples

Get started with the top-nested operator

Enhance top-nested results with data from another column

Use with others to explore excluded data

Sort hierarchical results

Get the most recent events per state with additional data from other columns

Get the latest records per identity with additional data from other columns

Related content

Feedback

Feedback

Additional resources

Get started with the `top-nested` operator

Use `with` `others` to explore excluded data