hll_union_agg function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.3 LTS and above

This function utilizes the HyperLogLog algorithm to combine a group of sketches into a single one.

Queries can use the resulting buffers to compute approximate unique counts with the hll_sketch_estimate function.

The implementation uses the Apache Datasketches library. Please see HLL for more information.

Syntax

hll_union_agg ( expr [, allowDifferentLgConfigK ] )

This function can also be invoked as a window function using the OVER clause.

Arguments

  • expr: A BINARY expression holding a sketch generated by hll_sketch_agg.
  • allowDifferentLgConfigK: A optional BOOLEAN constant expression controlling whether to allow merging sketches with different lgConfigK values. The default value is false.

Returns

A BINARY buffer containing the HyperLogLog sketch computed as a result of combining the input expressions of the same group.

When the allowDifferentLgConfigK parameter is true, the result sketch uses the smaller of the two provided lgConfigK values.

Examples

> SELECT hll_sketch_estimate(hll_union_agg(sketch, true))
    FROM (SELECT hll_sketch_agg(col) as sketch
            FROM VALUES (1) AS tab(col)
          UNION ALL
          SELECT hll_sketch_agg(col, 20) as sketch
            FROM VALUES (1) AS tab(col));
  1

> SELECT hll_sketch_estimate(hll_union_agg(sketch, false))
    FROM (SELECT hll_sketch_agg(col) as sketch
            FROM VALUES (1) AS tab(col)
          UNION ALL
          SELECT hll_sketch_agg(col, 20) as sketch
            FROM VALUES (1) AS tab(col));
  error