hll_sketch_estimate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.3 LTS and above

This function utilizes the HyperLogLog algorithm to count a probabilistic approximation of the number of unique values in a given column, consuming a binary representation known as a sketch buffer previously generated by the hll_sketch_agg function and returning the result as a big integer.

The hll_union and hll_union_agg functions can also combine sketches together by consuming and merging these buffers as inputs.

The implementation uses the Apache Datasketches library. Please see HLL for more information.

Syntax

hll_sketch_estimate ( expr )

Arguments

  • expr: A BINARY expression holding a sketch generated by hll_sketch_agg.

Returns

A BIGINT value that is the approximate distinct count represented by the input sketch.

Examples

> SELECT hll_sketch_estimate(hll_sketch_agg(col, 12))
    FROM VALUES (1), (1), (2), (2), (3) tab(col);
  3

> SELECT hll_sketch_estimate(hll_sketch_agg(col))
    FROM VALUES (1), (1), (2), (2), (3) tab(col);
  3