RANK (U-SQL)

Article
03/10/2017

Summary

The RANK ranking function returns the rank of each row within the window. The rank of a row is one plus the number of ranks that come before the row in question.

If two or more rows tie for a rank, each tied rows receives the same rank and the next highest row will receive the rank as if the previous rows were not tied. For example, if the three top salespeople have the same SalesYTD value, they are all ranked one. The salesperson with the next highest SalesYTD is ranked number four, because there are three rows that are ranked higher. Therefore, the RANK function does not always return consecutive integers.

The sort order that is used for the whole query determines the order in which the rows appear in a result set.

RANK can only be used in the context of a windowing expression.

Syntax

RANK_Expression :=                                                                                       
     'RANK' '(' ')'.

Return Type

The return type is long?.

Usage in Windowing Expression

This ranking function can be used in a windowing expression with the following restrictions:

The ORDER BY clause in the OVER operator is required.
The ROWS clause in the OVER operator is not allowed.

Examples

The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.

The examples below are based on the dataset defined below. Ensure your execution includes the rowset variable.

  @employees = 
      SELECT * FROM 
          ( VALUES
          (1, "Noah",   "Engineering", 100, 10000),
          (2, "Sophia", "Engineering", 100, 15000),
          (3, "Liam",   "Engineering", 100, 30000),
          (4, "Amy",    "Engineering", 100, 35000),
          (5, "Justin", "Engineering", 100, 15000),
          (6, "Emma",   "HR",          200, 8000),
          (7, "Jacob",  "HR",          200, 8000),
          (8, "Olivia", "HR",          200, 8000),
          (9, "Mason",  "Executive",   300, 50000),
          (10, "Ava",   "Marketing",   400, 15000),
          (11, "Ethan", "Marketing",   400, 9000) 
          ) AS T(EmpID, EmpName, DeptName, DeptID, Salary);

A. Ranking all rows in a result set
The following example returns all employees ranked by his/her salary. Because a PARTITION BY clause was not specified, the RANK function was applied to all rows in the result set.

@result =
    SELECT *,
           RANK() OVER(ORDER BY Salary DESC) AS SalaryRank
    FROM @employees;

OUTPUT @result
TO "/Output/ReferenceGuide/Ranking/rank/exampleA.csv"
USING Outputters.Csv();

B. Ranking rows within a partition
The following example ranks the salaries within each department. The result set is partitioned by DeptID and logically ordered by Salary.

@result =
    SELECT *,
           RANK() OVER(PARTITION BY DeptID ORDER BY Salary DESC) AS SalaryRankByDept
    FROM @employees;

OUTPUT @result
TO "/Output/ReferenceGuide/Ranking/rank/exampleB.csv"
USING Outputters.Csv();