corr aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Returns Pearson coefficient of correlation between a group of number pairs.

Syntax

corr ( [ALL | DISTINCT] expr1, expr2 ) [FILTER ( WHERE cond ) ]

This function can also be invoked as a window function using the OVER clause.

Arguments

  • expr1: An expression that evaluates to a numeric.
  • expr2: An expression that evaluates to a numeric.
  • cond: An optional boolean expression filtering the rows used for aggregation.

Returns

A DOUBLE.

If DISTINCT is specified the function operates only on a unique set of expr1, expr2 pairs.

Examples

> SELECT corr(c1, c2) FROM VALUES (3, 2), (3, 3), (3, 3), (6, 4) as tab(c1, c2);
 0.816496580927726

> SELECT corr(DISTINCT c1, c2) FROM VALUES (3, 2), (3, 3), (3, 3), (6, 4) as tab(c1, c2);
 0.8660254037844387

> SELECT corr(DISTINCT c1, c2) FILTER(WHERE c1 != c2)
    FROM VALUES (3, 2), (3, 3), (3, 3), (6, 4) as tab(c1, c2);
 1.0