The PERCENTILE_DISC analytic function computes a specific percentile for sorted values in the specified window based on a discrete distribution of the column values. The result is equal to a specific value in the column.
For a given percentile value P, PERCENTILE_DISC returns the value with the smallest CUME_DIST value for the same sorted list of values that is greater than or equal to P.
For example, PERCENTILE_DISC(0.5) will compute the 50th percentile (that is, the median) of an expression.
PERCENTILE_DISC can only be used in the context of a windowing expression.
PERCENTILE_DISC_Expression := 'PERCENTILE_DISC' '(' double_literal ')' 'WITHIN' 'GROUP' '(' Simple_Order_By_Clause ')'.
The percentile to compute. The value must range between 0.0 and 1.0 and be of type double.
Simple_Order_By_Clause := 'ORDER' 'BY' Sort_Item_Expression.
Sort_Item_Expression := expression [Sort_Direction].
Sort_Direction := 'ASC' | 'DESC'.
Only one ORDER BY expression is allowed. The type of the expression has to be comparable. The default sort order is ascending.
The return type is the type of the ORDER BY expression.
Usage in Windowing Expression
This analytic function can be used in a windowing expression with the following restrictions:
- Only the PARTITION BY clause can be specified with the OVER operator.
- The ORDER BY clause in the OVER operator is not allowed.
- The ROWS clause in the OVER operator is not allowed.
The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
The example below is based on the dataset defined below. Ensure your execution includes the rowset variable.
@employees = SELECT * FROM ( VALUES (1, "Noah", "Engineering", 100, 10000), (2, "Sophia", "Engineering", 100, 20000), (3, "Liam", "Engineering", 100, 30000), (4, "Amy", "Engineering", 100, 35000), (5, "Emma", "HR", 200, 8000), (6, "Jacob", "HR", 200, 8000), (7, "Olivia", "HR", 200, 8000), (8, "Mason", "Executive", 300, 50000), (9, "Ava", "Marketing", 400, 15000), (10, "Ethan", "Marketing", 400, 9000) ) AS T(EmpID, EmpName, DeptName, DeptID, Salary);
The following example uses
PERCENTILE_DISC to find the median employee salary in each department.
@result = SELECT DISTINCT DeptName, PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Salary) OVER(PARTITION BY DeptName) AS MedianDisc FROM @employees; OUTPUT @result TO "/Output/ReferenceGuide/percentile_disc/example.csv" USING Outputters.Csv();