top-nested operator
The top-nested
operator performs hierarchical aggregation and value selection.
Imagine you have a table with sales information like regions, salespeople, and amounts sold. The top-nested
operator can help you answer complex questions, such as "What are the top five regions by sales, and who are the top three salespeople in each of those regions?"
The source data is partitioned based on the criteria set in the first top-nested
clause, such as region. Next, the operator picks the top records in each partition using an aggregation, such as adding sales amounts. Each subsequent top-nested
clause refines the partitions created by the previous clause, creating a hierarchy of more precise groups.
The result is a table with two columns per clause. One column holds the partitioning values, such as region, while the other column holds the outcomes of the aggregation calculation, like the sum of sales.
Syntax
T |
top-nested
[ N ] of
Expr [with
others
=
ConstExpr] by
Aggregation [asc
| desc
] [,
top-nested
... ]
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string |
✔️ | The input tabular expression. |
N | int |
The number of top values to be returned for this hierarchy level. If omitted, all distinct values are returned. | |
Expr | string |
✔️ | An expression over the input record indicating which value to return for this hierarchy level. Typically, it refers to a column from T or involves a calculation like bin() on a column. Optionally, set an output column name as Name = Expr. |
ConstExpr | string |
If specified, for each hierarchy level, one record is added with the value that is the aggregation over all records that didn't make it to the top. | |
Aggregation | string |
The aggregation function applied to records with the same Expr value. The result determines the top records. See Supported aggregation functions. Optionally, set an output column name as Name = Aggregation. |
Supported aggregation functions
The following aggregation functions are supported:
Note
Any algebraic combination of the aggregations is also supported.
Returns
A table with two columns for each clause. One column contains unique values computed using Expr, and the other column shows the results obtained from the Aggregation calculation.
Include data from other columns
Only columns specified as a top-nested
clause Expr are displayed in the output table.
To include all values of a column at a specific level:
- Don't specify the value of N.
- Use the column name as the value of Expr.
- Use
Ignore=max(1)
as the value of Aggregation. - Remove the unnecessary
Ignore
column with project-away.
For an example, see Get the most recent events per state with additional data from other columns.
Performance considerations
The number of records may grow exponentially with the number of top-nested
clauses, and record growth is even faster if no N limit is specified. This operator may consume a considerable amount of resources.
If the distribution of the aggregation is considerably non-uniform, limit the number of distinct values to return by specifying N. Then, use the with
others
=
ConstExpr specification to get an indication for the weight of all other cases.
Examples
Get started with the top-nested
operator
The following query partitions the StormEvents
table by the State
column and calculates the total latitude for each state. The query selects the top two states with the highest latitude sum. Within these top two states, the query groups the data by Source
and selects the top three sources with the highest latitude sum. For each of the top three sources in the top two states, the query groups the data by EndLocation
and selects the EndLocation
with the highest latitude sum.
StormEvents // Data source.
| top-nested 2 of State by sum(BeginLat), // Top 2 States by total latitude.
top-nested 3 of Source by sum(BeginLat), // Top 3 Sources by total latitude in each State.
top-nested 1 of EndLocation by sum(BeginLat) // Top 1 EndLocation by total latitude in each Source and State.
Output
State | aggregated_State | Source | aggregated_Source | EndLocation | aggregated_EndLocation |
---|---|---|---|---|---|
KANSAS | 87771.2355000001 | Law Enforcement | 18744.823 | FT SCOTT | 264.858 |
KANSAS | 87771.2355000001 | Public | 22855.6206 | BUCKLIN | 488.2457 |
KANSAS | 87771.2355000001 | Trained Spotter | 21279.7083 | SHARON SPGS | 388.7404 |
TEXAS | 123400.5101 | Public | 13650.9079 | AMARILLO | 246.2598 |
TEXAS | 123400.5101 | Law Enforcement | 37228.5966 | PERRYTON | 289.3178 |
TEXAS | 123400.5101 | Trained Spotter | 13997.7124 | CLAUDE | 421.44 |
Enhance top-nested results with data from another column
The following query builds upon the previous example by introducing an extra top-nested
clause. In this new clause, the absence of a numeric specification results in the extraction of all distinct values of EventType
across the partitions. The max(1)
aggregation function is merely a placeholder, rendering its outcome irrelevant, so the project-away operator removes the Ignore
column. The result shows all event types associated with the previously aggregated data.
StormEvents
| top-nested 2 of State by sum(BeginLat),
top-nested 3 of Source by sum(BeginLat),
top-nested 1 of EndLocation by sum(BeginLat),
top-nested of EventType by Ignore = max(1)
| project-away Ignore
Output
State | aggregated_State | Source | aggregated_Source | EndLocation | aggregated_EndLocation | EventType |
---|---|---|---|---|---|---|
TEXAS | 123400.51009999994 | Public | 13650.907900000002 | AMARILLO | 246.25979999999998 | Hail |
TEXAS | 123400.51009999994 | Public | 13650.907900000002 | AMARILLO | 246.25979999999998 | Thunderstorm Wind |
KANSAS | 87771.235500000068 | Public | 22855.6206 | BUCKLIN | 488.2457 | Flood |
KANSAS | 87771.235500000068 | Public | 22855.6206 | BUCKLIN | 488.2457 | Thunderstorm Wind |
KANSAS | 87771.235500000068 | Public | 22855.6206 | BUCKLIN | 488.2457 | Hail |
TEXAS | 123400.51009999994 | Trained Spotter | 13997.712400000009 | CLAUDE | 421.44 | Hail |
KANSAS | 87771.235500000068 | Law Enforcement | 18744.823000000004 | FT SCOTT | 264.858 | Flash Flood |
KANSAS | 87771.235500000068 | Law Enforcement | 18744.823000000004 | FT SCOTT | 264.858 | Thunderstorm Wind |
KANSAS | 87771.235500000068 | Law Enforcement | 18744.823000000004 | FT SCOTT | 264.858 | Flood |
TEXAS | 123400.51009999994 | Law Enforcement | 37228.596599999961 | PERRYTON | 289.3178 | Hail |
... | ... | ... | ... | ... | ... |
Use with
others
to explore excluded data
When incorporated within a top-nested
clause, the with
others
specification introduces an extra record that aggregates data excluded from the top results. In the following query, an extra record is created in the State
and aggregated_State
columns, representing the collective latitude of all states except Kansas and Texas. Moreover, the EndLocation
and aggregated_EndLocation
column have an extra nine records. These records show the combined latitude of end locations not qualifying as the top location within each state and source.
StormEvents
| top-nested 2 of State with others = "All Other States" by sum(BeginLat),
top-nested 3 of Source by sum(BeginLat),
top-nested 1 of EndLocation with others = "All Other End Locations" by sum(BeginLat)
Output
State | aggregated_State | Source | aggregated_Source | EndLocation | aggregated_EndLocation |
---|---|---|---|---|---|
KANSAS | 87771.2355000001 | Law Enforcement | 18744.823 | FT SCOTT | 264.858 |
KANSAS | 87771.2355000001 | Public | 22855.6206 | BUCKLIN | 488.2457 |
KANSAS | 87771.2355000001 | Trained Spotter | 21279.7083 | SHARON SPGS | 388.7404 |
TEXAS | 123400.5101 | Public | 13650.9079 | AMARILLO | 246.2598 |
TEXAS | 123400.5101 | Law Enforcement | 37228.5966 | PERRYTON | 289.3178 |
TEXAS | 123400.5101 | Trained Spotter | 13997.7124 | CLAUDE | 421.44 |
KANSAS | 87771.2355000001 | Law Enforcement | 18744.823 | All Other End Locations | 18479.965 |
KANSAS | 87771.2355000001 | Public | 22855.6206 | All Other End Locations | 22367.3749 |
KANSAS | 87771.2355000001 | Trained Spotter | 21279.7083 | All Other End Locations | 20890.9679 |
TEXAS | 123400.5101 | Public | 13650.9079 | All Other End Locations | 13404.6481 |
TEXAS | 123400.5101 | Law Enforcement | 37228.5966 | All Other End Locations | 36939.2788 |
TEXAS | 123400.5101 | Trained Spotter | 13997.7124 | All Other End Locations | 13576.2724 |
KANSAS | 87771.2355000001 | All Other End Locations | 24891.0836 | ||
TEXAS | 123400.5101 | All Other End Locations | 58523.2932000001 | ||
All Other States | 1149279.5923 | All Other End Locations | 1149279.5923 |
The following query shows the same results for the first level used in the previous example.
StormEvents
| where State !in ('TEXAS', 'KANSAS')
| summarize sum(BeginLat)
Output
sum_BeginLat |
---|
1149279.5923 |
Sort hierarchical results
To achieve a comprehensive sort order, the following query uses index-based sorting for each value within the current hierarchy level, per group. This sorting is geared towards arranging the result according to the ultimate nested level, in this case the EndLocation
.
StormEvents
| top-nested 2 of State by sum(BeginLat),
top-nested 2 of Source by sum(BeginLat),
top-nested 4 of EndLocation by sum(BeginLat)
| sort by State, Source, aggregated_EndLocation
| summarize
EndLocations = make_list(EndLocation, 10000),
endLocationSums = make_list(aggregated_EndLocation, 10000)
by State, Source
| extend indicies = range(0, array_length(EndLocations) - 1, 1)
| mv-expand EndLocations, endLocationSums, indicies
Output
State | Source | EndLocations | endLocationSums | indices |
---|---|---|---|---|
TEXAS | Trained Spotter | CLAUDE | 421.44 | 0 |
TEXAS | Trained Spotter | AMARILLO | 316.8892 | 1 |
TEXAS | Trained Spotter | DALHART | 252.6186 | 2 |
TEXAS | Trained Spotter | PERRYTON | 216.7826 | 3 |
TEXAS | Law Enforcement | PERRYTON | 289.3178 | 0 |
TEXAS | Law Enforcement | LEAKEY | 267.9825 | 1 |
TEXAS | Law Enforcement | BRACKETTVILLE | 264.3483 | 2 |
TEXAS | Law Enforcement | GILMER | 261.9068 | 3 |
KANSAS | Trained Spotter | SHARON SPGS | 388.7404 | 0 |
KANSAS | Trained Spotter | ATWOOD | 358.6136 | 1 |
KANSAS | Trained Spotter | LENORA | 317.0718 | 2 |
KANSAS | Trained Spotter | SCOTT CITY | 307.84 | 3 |
KANSAS | Public | BUCKLIN | 488.2457 | 0 |
KANSAS | Public | ASHLAND | 446.4218 | 1 |
KANSAS | Public | PROTECTION | 446.11 | 2 |
KANSAS | Public | MEADE STATE PARK | 371.1 | 3 |
Get the most recent events per state with additional data from other columns
The following query demonstrates how to retrieve the two most recent events for each US state along with relevant event details. Notice the use of max(1)
within certain columns, identified by Ignore*
, which aids in propagating data through the query without imposing any selection logic.
StormEvents
| top-nested of State by Ignore0=max(1), // Partition the data by each unique value of state.
top-nested 2 of StartTime by Ignore1=max(StartTime), // Get the 2 most recent events in each state.
top-nested of EndTime by Ignore2=max(1), // Append the EndTime for each event.
top-nested of EpisodeId by Ignore3=max(1) // Append the EpisodeId for each event.
| project-away Ignore* // Remove the unnecessary aggregation columns.
| order by State asc, StartTime desc // Sort results alphabetically and chronologically.
Get the latest records per identity with additional data from other columns
The following query showcases how to extract the latest records per identity and builds on the concepts introduced in the previous example. The first top-nested
clause partitions the data by distinct values of id
. The subsequent clause identifies the two most recent records based on the timestamp
for each id
. Other information is appended using a top-nested
operator alongside an unspecified count and the arbitrary max(1)
aggregation. Finally, unnecessary aggregation columns are removed using the project-away
operator.
datatable(id: string, timestamp: datetime, otherInformation: string) // Create a source datatable.
[
"Barak", datetime(2015-01-01), "1",
"Barak", datetime(2016-01-01), "2",
"Barak", datetime(2017-01-20), "3",
"Donald", datetime(2017-01-20), "4",
"Donald", datetime(2017-01-18), "5",
"Donald", datetime(2017-01-19), "6"
]
| top-nested of id by Ignore0=max(1), // Partition the data by each unique value of id.
top-nested 2 of timestamp by Ignore1=max(timestamp), // Get the 2 most recent events for each state.
top-nested of otherInformation by Ignore2=max(1) // Append otherInformation for each event.
| project-away Ignore0, Ignore1, Ignore2 // Remove the unnecessary aggregation columns.
Output
id | timestamp | otherInformation |
---|---|---|
Barak | 2016-01-01T00:00:00Z | 2 |
Donald | 2017-01-19T00:00:00Z | 6 |
Barak | 2017-01-20T00:00:00Z | 3 |
Donald | 2017-01-20T00:00:00Z | 4 |
Related content
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for