summarize operator

Produces a table that aggregates the content of the input table.

T | summarize count(), avg(price) by fruit, supplier

A table that shows the number and average price of each fruit from each supplier. There's a row in the output for each distinct combination of fruit and supplier. The output columns show the count, average price, fruit and supplier. All other input columns are ignored.

T | summarize count() by price_range=bin(price, 10.0)

A table that shows how many items have prices in each interval [0,10.0], [10.0,20.0], and so on. This example has a column for the count and one for the price range. All other input columns are ignored.

Syntax

T | summarize [[Column =] Aggregation [, ...]] [by [Column =] GroupExpression [, ...]]

Arguments

  • Column: Optional name for a result column. Defaults to a name derived from the expression.
  • Aggregation: A call to an aggregation function such as count() or avg(), with column names as arguments. See the list of aggregation functions.
  • GroupExpression: An expression over the columns, that provides a set of distinct values. Typically it's either a column name that already provides a restricted set of values, or bin() with a numeric or time column as argument.

Note

When the input table is empty, the output depends on whether GroupExpression is used:

  • If GroupExpression is not provided, the output will be a single (empty) row.
  • If GroupExpression is provided, the output will have no rows.

Returns

The input rows are arranged into groups having the same values of the by expressions. Then the specified aggregation functions are computed over each group, producing a row for each group. The result contains the by columns and also at least one column for each computed aggregate. (Some aggregation functions return multiple columns.)

The result has as many rows as there are distinct combinations of by values (which may be zero). If there are no group keys provided, the result has a single record.

To summarize over ranges of numeric values, use bin() to reduce ranges to discrete values.

Note

  • Although you can provide arbitrary expressions for both the aggregation and grouping expressions, it's more efficient to use simple column names, or apply bin() to a numeric column.
  • The automatic hourly bins for datetime columns is no longer supported. Use explicit binning instead. For example, summarize by bin(timestamp, 1h).

List of aggregation functions

Function Description
any() Returns a random non-empty value for the group
anyif() Returns a random non-empty value for the group (with predicate)
arg_max() Returns one or more expressions when the argument is maximized
arg_min() Returns one or more expressions when the argument is minimized
avg() Returns an average value across the group
avgif() Returns an average value across the group (with predicate)
binary_all_and Returns aggregated value using the binary AND of the group
binary_all_or Returns aggregated value using the binary OR of the group
binary_all_xor Returns aggregated value using the binary XOR of the group
buildschema() Returns the minimal schema that admits all values of the dynamic input
count() Returns a count of the group
countif() Returns a count with the predicate of the group
dcount() Returns an approximate distinct count of the group elements
dcountif() Returns an approximate distinct count of the group elements (with predicate)
make_bag() Returns a property bag of dynamic values within the group
make_bag_if() Returns a property bag of dynamic values within the group (with predicate)
make_list() Returns a list of all the values within the group
make_list_if() Returns a list of all the values within the group (with predicate)
make_list_with_nulls() Returns a list of all the values within the group, including null values
make_set() Returns a set of distinct values within the group
make_set_if() Returns a set of distinct values within the group (with predicate)
max() Returns the maximum value across the group
maxif() Returns the maximum value across the group (with predicate)
min() Returns the minimum value across the group
minif() Returns the minimum value across the group (with predicate)
percentiles() Returns the percentile approximate of the group
percentiles_array() Returns the percentiles approximates of the group
percentilesw() Returns the weighted percentile approximate of the group
percentilesw_array() Returns the weighted percentiles approximates of the group
stdev() Returns the standard deviation across the group
stdevif() Returns the standard deviation across the group (with predicate)
sum() Returns the sum of the elements withing the group
sumif() Returns the sum of the elements withing the group (with predicate)
variance() Returns the variance across the group
varianceif() Returns the variance across the group (with predicate)

Aggregates default values

The following table summarizes the default values of aggregations:

Operator Default value
count(), countif(), dcount(), dcountif() 0
make_bag(), make_bag_if(), make_list(), make_list_if(), make_set(), make_set_if() empty dynamic array ([])
All others null

When using these aggregates over entities which includes null values, the null values will be ignored and won't participate in the calculation (see examples below).

Examples

Summarize price by fruit and supplier

Example

Determine what unique combinations of ActivityType and CompletionStatus there are in a table. There are no aggregation functions, just group-by keys. The output will just show the columns for those results:

Activities | summarize by ActivityType, completionStatus
ActivityType completionStatus
dancing started
singing started
dancing abandoned
singing completed

Example

Finds the minimum and maximum timestamp of all records in the Activities table. There is no group-by clause, so there is just one row in the output:

Activities | summarize Min = min(Timestamp), Max = max(Timestamp)
Min Max
1975-06-09 09:21:45 2015-12-24 23:45:00

Example

Create a row for each continent, showing a count of the cities in which activities occur. Because there are few values for "continent", no grouping function is needed in the 'by' clause:

Activities | summarize cities=dcount(city) by continent
cities continent
4290 Asia
3267 Europe
2673 North America

Example

The following example calculates a histogram for each activity type. Because Duration has many values, use bin to group its values into 10-minute intervals:

Activities | summarize count() by ActivityType, length=bin(Duration, 10m)
count_ ActivityType length
354 dancing 0:00:00.000
23 singing 0:00:00.000
2717 dancing 0:10:00.000
341 singing 0:10:00.000
725 dancing 0:20:00.000
2876 singing 0:20:00.000
...

Example for the aggregates default values

When the input of summarize operator has at least one empty group-by key, it's result is empty, too.

When the input of summarize operator doesn't have an empty group-by key, the result is the default values of the aggregates used in the summarize:

datatable(x:long)[]
| summarize any(x), arg_max(x, x), arg_min(x, x), avg(x), buildschema(todynamic(tostring(x))), max(x), min(x), percentile(x, 55), hll(x) ,stdev(x), sum(x), sumif(x, x > 0), tdigest(x), variance(x)
any_x max_x max_x_x min_x min_x_x avg_x schema_x max_x1 min_x1 percentile_x_55 hll_x stdev_x sum_x sumif_x tdigest_x variance_x
datatable(x:long)[]
| summarize  count(x), countif(x > 0) , dcount(x), dcountif(x, x > 0)
count_x countif_ dcount_x dcountif_x
0 0 0 0
datatable(x:long)[]
| summarize  make_set(x), make_list(x)
set_x list_x
[] []

The aggregate avg sums all the non-nulls and counts only those which participated in the calculation (will not take nulls into account).

range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize sum(y), avg(y)
sum_y avg_y
5 5

The regular count will count nulls:

range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize count(y)
count_y
2
range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize make_set(y), make_set(y)
set_y set_y1
[5.0] [5.0]