bitmap_or_agg aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.3 LTS and above

Returns the bitwise OR of all BINARY input values in the group.

The function is commonly used to count the number of distinct integral numbers in combination with the bitmap_bucket_number(), bitmap_count(), bitmap_bit_position(), and bitmap_construct_agg() functions.

To aggregate bitmaps in form of integral numerics, use the bit_or() aggregate function. To aggregate bit positions into a BINARY bitmap, use bitmap_construct_agg().

Syntax

bitmap_or_agg(expr) [FILTER ( WHERE cond ) ]

This function can also be invoked as a window function using the OVER clause.

Arguments

  • expr: A BINARY of length <= 4096 representing a bitmap. Azure Databricks truncates longer values to 4096.
  • cond: An optional boolean expression filtering the rows used for aggregation.

Returns

A BINARY of length 4096.

Examples

> SELECT hex(trim(TRAILING X'00' FROM bitmap_or_agg(val))) FROM VALUES(x'10'), (x'02') AS T(val);
 12

-- Count the number of distinct values across two tables
> SELECT sum(num_distinct) AS num_distinct
    FROM (SELECT bucket, bitmap_count(bitmap_or_agg(num_distinct)) AS num_distinct
            FROM ((SELECT bitmap_bucket_number(val) AS bucket,
                          bitmap_construct_agg(bitmap_bit_position(val)) AS num_distinct
                     FROM VALUES(1), (2), (1), (-1), (5), (0), (5) AS t(val)
                     GROUP BY ALL)
                  UNION ALL
                  (SELECT bitmap_bucket_number(val) AS bucket,
                          bitmap_construct_agg(bitmap_bit_position(val)) AS num_distinct
                     FROM VALUES(3), (1), (-1), (6), (5), (1), (5), (8) AS t(val)
                     GROUP BY ALL))
            GROUP BY ALL);
  8