summarize operator
Produces a table that aggregates the content of the input table.
T | summarize count(), avg(price) by fruit, supplier
A table that shows the number and average price of each fruit from each supplier. There's a row in the output for each distinct combination of fruit and supplier. The output columns show the count, average price, fruit and supplier. All other input columns are ignored.
T | summarize count() by price_range=bin(price, 10.0)
A table that shows how many items have prices in each interval [0,10.0], [10.0,20.0], and so on. This example has a column for the count and one for the price range. All other input columns are ignored.
Syntax
T | summarize
[[Column =
] Aggregation [,
...]]
[by
[Column =
] GroupExpression [,
...]]
Arguments
Column: Optional name for a result column. Defaults to a name derived from the expression.
Aggregation: A call to an aggregation function such as
count()
oravg()
, with column names as arguments. See the list of aggregation functions.GroupExpression: An expression over the columns, that provides a set of distinct values. Typically it's either a column name that already provides a restricted set of values, or
bin()
with a numeric or time column as argument.If you don't provide a GroupExpression, the whole table is summarized in a single output row.
Returns
The input rows are arranged into groups having the same values of the by
expressions. Then the specified aggregation functions are computed over each group, producing a row for each group. The result contains the by
columns and also at least one column for each computed aggregate. (Some aggregation functions return multiple columns.)
The result has as many rows as there are distinct combinations of by
values. If you want to summarize over ranges of numeric values, use bin()
to reduce ranges to discrete values.
Notes
Although you can provide arbitrary expressions for both the aggregation and grouping expressions, it's more efficient to use simple column names, or apply bin()
to a numeric column.
The automatic hourly bins for datetime columns is no longer supported. Use explicit binning instead - for example summarize by bin(timestamp, 1h)
.
List of aggregation functions
Function | Description |
---|---|
any() | Returns random non-empty value for the group |
arg_max() | Returns one or more expressions when argument is maximized |
arg_min() | Returns one or more expressions when argument is minimized |
avg() | Returns average value across the group |
buildschema() | Returns the minimal schema that admits all values of the dynamic input |
count() | Returns count of the group |
countif() | Returns count with the predicate of the group |
dcount() | Returns approximate distinct count of the group elements |
make_bag() | Returns a property bag of dynamic values within the group |
make_list() | Returns a list of all the values within the group |
make_set() | Returns a set of distinct values within the group |
max() | Returns the maximum value across the group |
min() | Returns the minimum value across the group |
percentiles() | Returns the percentile approximate of the group |
stdev() | Returns the standard deviation across the group |
sum() | Returns the sum of the elements withing the group |
variance() | Returns the variance across the group |
Aggregates default values
The following table summarizes the default values of aggregations
Operator | Default value |
---|---|
count() , countif() , dcount() , dcountif() |
0 |
make_set() , make_list() |
empty dynamic array ([]) |
any() , arg_max() . arg_min() , avg() , buildschema() , hll() , max() , min() , percentiles() , stdev() , sum() , sumif() , tdigest() , variance() |
null |
In addition, when using these aggregates over entities which includes null values, the null values will be ignored and won't participate in the calculation (See examples below).
Examples
Example
Determine what unique combinations of
ActivityType
and CompletionStatus
there are in a table. Note that
there are no aggregation functions, just group-by keys. The output will just show the columns for those results:
Activities | summarize by ActivityType, completionStatus
ActivityType |
completionStatus |
---|---|
dancing |
started |
singing |
started |
dancing |
abandoned |
singing |
completed |
Example
Finds the minimum and maximum timestamp of all records in the Activities table. There is no group-by clause, so there is just one row in the output:
Activities | summarize Min = min(Timestamp), Max = max(Timestamp)
Min |
Max |
---|---|
1975-06-09 09:21:45 |
2015-12-24 23:45:00 |
Example
Create a row for each continent, showing a count of the cities in which activities occur. Because there are few values for "continent", no grouping function is needed in the 'by' clause:
Activities | summarize cities=dcount(city) by continent
cities |
continent |
---|---|
4290 |
Asia |
3267 |
Europe |
2673 |
North America |
Example
The following example calculates a histogram for each activity
type. Because Duration
has many values, we use bin
to group its values into 10-minute intervals:
Activities | summarize count() by ActivityType, length=bin(Duration, 10m)
count_ |
ActivityType |
length |
---|---|---|
354 |
dancing |
0:00:00.000 |
23 |
singing |
0:00:00.000 |
2717 |
dancing |
0:10:00.000 |
341 |
singing |
0:10:00.000 |
725 |
dancing |
0:20:00.000 |
2876 |
singing |
0:20:00.000 |
... |
Examples for the aggregates default values
When the input of summarize operator that has at least one group-by key is empty then it's result is empty too.
When the input of summarize operator that doesn't have any group-by key is empty, then the result is the default values of the aggregates used in the summarize:
range x from 1 to 10 step 1
| where 1 == 2
| summarize any(x), argmax(x, x), argmin(x, x), avg(x), buildschema(todynamic(tostring(x))), max(x), min(x), percentile(x, 55), hll(x) ,stdev(x), sum(x), sumif(x, x > 0), tdigest(x), variance(x)
any_x | max_x | max_x_x | min_x | min_x_x | avg_x | schema_x | max_x1 | min_x1 | percentile_x_55 | hll_x | stdev_x | sum_x | sumif_x | tdigest_x | variance_x |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
range x from 1 to 10 step 1
| where 1 == 2
| summarize count(x), countif(x > 0) , dcount(x), dcountif(x, x > 0)
count_x | countif_ | dcount_x | dcountif_x |
---|---|---|---|
0 | 0 | 0 | 0 |
range x from 1 to 10 step 1
| where 1 == 2
| summarize make_set(x), make_list(x)
set_x | list_x |
---|---|
[] | [] |
The aggregate avg sums all the non-nulls and counts only those which participated in the calculation (will not take nulls into account).
range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize sum(y), avg(y)
sum_y | avg_y |
---|---|
5 | 5 |
The regular count will count nulls:
range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize count(y)
count_y |
---|
2 |
range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize make_set(y), make_set(y)
set_y | set_y1 |
---|---|
[5.0] | [5.0] |