# dcount() (aggregation function)

Returns an estimate for the number of distinct values that are taken by a scalar expression in the summary group.

Note

The `dcount()`

aggregation function is primarily useful for estimating the cardinality of huge sets. It trades performance for accuracy, and may return a result that varies between executions. The order of inputs may have an effect on its output.

## Syntax

... `|`

`summarize`

`dcount`

`(`

* Expr*[,

*]*

`Accuracy`

`)`

...## Arguments

*Expr*: A scalar expression whose distinct values are to be counted.*Accuracy*: An optional`int`

literal that defines the requested estimation accuracy. See below for supported values. If unspecified, the default value`1`

is used.

## Returns

Returns an estimate of the number of distinct values of * Expr* in the group.

## Example

```
PageViewLog | summarize countries=dcount(country) by continent
```

Get an exact count of distinct values of `V`

grouped by `G`

.

```
T | summarize by V, G | summarize count() by G
```

This calculation requires a great amount of internal memory, since distinct values of `V`

are multiplied by the number of distinct values of `G`

.
It may result in memory errors or large execution times.
`dcount()`

provides a fast and reliable alternative:

```
T | summarize dcount(V) by G | count
```

## Estimation accuracy

The `dcount()`

aggregate function uses a variant of the HyperLogLog (HLL) algorithm,
which does a stochastic estimation of set cardinality. The algorithm provides a "knob" that can be used to balance accuracy and execution time per memory size:

Accuracy | Error (%) | Entry count |
---|---|---|

0 | 1.6 | 2^{12} |

1 | 0.8 | 2^{14} |

2 | 0.4 | 2^{16} |

3 | 0.28 | 2^{17} |

4 | 0.2 | 2^{18} |

Note

The "entry count" column is the number of 1-byte counters in the HLL implementation.

The algorithm includes some provisions for doing a perfect count (zero error), if the set cardinality is small enough:

- When the accuracy level is
`1`

, 1000 values are returned - When the accuracy level is
`2`

, 8000 values are returned

The error bound is probabilistic, not a theoretical bound. The value is the standard deviation of error distribution (the sigma), and 99.7% of the estimations will have a relative error of under 3 x sigma.

The following image shows the probability distribution function of the relative estimation error, in percentages, for all supported accuracy settings: