Uniquely grouping with Row_Number and Dense_Rank issue

Scott Kent-Collins 21 Reputation points
2022-04-01T05:18:58.783+00:00

The following query:

create table #GroupingTemp
(
    MyID int, 
    MyStatus Varchar(50), 
    StartTime DateTime, 
    EndDate DatetIme 
)

INSERT INTO #GroupingTemp (MyID, MyStatus, StartTime, EndDate)

VALUES
(12345, 'Started', '2021-01-22 11:42:09.263', '2021-01-25 09:04:05.010'),
(12345, 'Started', '2021-01-25 09:04:05.010', '2021-01-25 09:04:44.747'),
(12345, 'Started', '2021-01-25 09:04:44.747', '2021-02-09 11:49:22.967'),
(12345, 'Complete Details', '2021-02-09 11:49:22.967', '2021-02-10 10:02:38.360'),
(12345, 'Complete Details', '2021-02-10 10:02:38.360', '2021-02-10 10:27:45.443'),
(12345, 'Started', '2021-02-10 10:27:45.443', '2021-02-10 10:28:10.270'),
(12345, 'Started', '2021-02-10 10:28:10.270', '2021-02-12 11:29:52.197'),
(12345, 'Started', '2021-02-12 11:29:52.197', '2021-02-12 14:03:48.243'),
(12345, 'Complete Details', '2021-02-12 14:03:48.243', '2021-02-12 14:10:55.933'),
(12345, 'Complete Details', '2021-02-12 14:10:55.933', '2021-02-15 09:16:29.357'),
(12345, 'Started', '2021-02-15 09:16:29.357', '2021-02-15 09:18:08.340'),
(12345, 'Started', '2021-02-15 09:18:08.340', '2021-02-15 14:11:00.007'),
(12345, 'Started', '2021-02-15 14:11:00.007', '2021-02-22 08:47:36.570'),
(12345, 'Complete Details', '2021-02-22 08:47:36.570', '2021-02-22 08:56:14.997'),
(12345, 'Complete Details', '2021-02-22 08:56:14.997', '2021-02-22 08:56:32.897'),
(12345, 'Complete Details', '2021-02-22 08:56:32.897', '2021-02-22 09:08:51.117'),
(12345, 'Complete Details', '2021-02-22 09:08:51.117', '2021-02-22 09:41:50.297'),
(12345, 'Started', '2021-02-22 09:41:50.297', '2021-02-22 09:42:34.573'),
(12345, 'Started', '2021-02-22 09:42:34.573', '2021-02-25 10:36:28.530'),
(12345, 'Started', '2021-02-25 10:36:28.530', '2021-02-26 13:55:22.577'),
(12345, 'Complete Details', '2021-02-26 13:55:22.577', '2021-02-26 13:56:09.527'),
(12345, 'Complete Details', '2021-02-26 13:56:09.527', '2021-03-01 09:18:33.723'),
(12345, 'Started', '2021-03-01 09:18:33.723', '2021-03-01 09:19:07.363'),
(12345, 'Started', '2021-03-01 09:19:07.363', '2021-03-04 10:44:51.393'),
(12345, 'Started', '2021-03-04 10:44:51.393', '2021-03-09 15:23:01.263'),
(12345, 'Complete Details', '2021-03-09 15:23:01.263', '2021-03-09 15:29:18.090'),
(12345, 'Complete Details', '2021-03-09 15:29:18.090', '2021-03-10 08:19:55.360'),
(12345, 'On Hold', '2021-03-10 08:19:55.360', '2021-03-12 09:26:39.860'),
(12345, 'On Hold', '2021-03-12 09:26:39.860', '2021-06-25 10:44:25.910'),
(12345, 'On Hold', '2021-06-25 10:44:25.910', '2021-06-25 12:07:03.347'),
(12345, 'Started', '2021-06-25 12:07:03.347', '2021-06-25 12:46:41.557'),
(12345, 'Started', '2021-06-25 12:46:41.557', '2021-07-01 10:35:12.447'),
(12345, 'Started', '2021-07-01 10:35:12.447', '2021-07-06 12:45:03.613'),
(12345, 'Started', '2021-07-06 12:45:03.613', '2021-07-07 08:48:12.387'),
(12345, 'Started', '2021-07-07 08:48:12.387', '2021-07-07 09:20:01.850'),
(12345, 'Started', '2021-07-07 09:20:01.850', '2021-07-07 09:33:14.467'),
(12345, 'Complete Details', '2021-07-07 09:33:14.467', '2021-07-07 09:33:52.493'),
(12345, 'Complete Details', '2021-07-07 09:33:52.493', '2021-07-07 09:42:03.823'),
(12345, 'Proceed', '2021-07-07 09:42:03.823', '2021-07-07 11:12:26.103'),
(12345, 'Proceed', '2021-07-07 11:12:26.103', '2021-07-07 11:49:53.833'),
(12345, 'Proceed', '2021-07-07 11:49:53.833', '2021-07-07 11:56:50.940'),
(12345, 'Delivery', '2021-07-07 11:56:50.940', '2021-07-08 08:28:19.423'),
(12345, 'Delivery', '2021-07-08 08:28:19.423', '2021-07-08 08:36:09.220');


SELECT 
MyID,
MyStatus,
StartTime,
ROW_NUMBER() OVER (ORDER BY MyID, StartTime) as RowNum,
DENSE_RANK() OVER (PARTITION BY MyID, MyStatus ORDER BY MyID, StartTime) as DenseRank,
ROW_NUMBER() OVER (ORDER BY MyID, StartTime) - DENSE_RANK() OVER (PARTITION BY MyID, MyStatus ORDER BY MyID, StartTime) as GroupOrder

FROM 
#GroupingTemp

ORDER BY StartTime

Which generates the following dataset:

MyID    MyStatus    StartTime   RowNum  DenseRank   GroupOrder
12345   Started 22/01/2021 11:42:09 1   1   0
12345   Started 25/01/2021 09:04:05 2   2   0
12345   Started 25/01/2021 09:04:45 3   3   0
12345   Complete Details    09/02/2021 11:49:23 4   1   3
12345   Complete Details    10/02/2021 10:02:38 5   2   3
12345   Started 10/02/2021 10:27:45 6   4   2
12345   Started 10/02/2021 10:28:10 7   5   2
12345   Started 12/02/2021 11:29:52 8   6   2
12345   Complete Details    12/02/2021 14:03:48 9   3   6
12345   Complete Details    12/02/2021 14:10:56 10  4   6
12345   Started 15/02/2021 09:16:29 11  7   4
12345   Started 15/02/2021 09:18:08 12  8   4
12345   Started 15/02/2021 14:11:00 13  9   4
12345   Complete Details    22/02/2021 08:47:37 14  5   9
12345   Complete Details    22/02/2021 08:56:15 15  6   9
12345   Complete Details    22/02/2021 08:56:33 16  7   9
12345   Complete Details    22/02/2021 09:08:51 17  8   9
12345   Started 22/02/2021 09:41:50 18  10  8
12345   Started 22/02/2021 09:42:35 19  11  8
12345   Started 25/02/2021 10:36:29 20  12  8
12345   Complete Details    26/02/2021 13:55:23 21  9   12
12345   Complete Details    26/02/2021 13:56:10 22  10  12
12345   Started 01/03/2021 09:18:34 23  13  10
12345   Started 01/03/2021 09:19:07 24  14  10
12345   Started 04/03/2021 10:44:51 25  15  10
12345   Complete Details    09/03/2021 15:23:01 26  11  15
12345   Complete Details    09/03/2021 15:29:18 27  12  15
12345   On Hold 10/03/2021 08:19:55 28  1   27
12345   On Hold 12/03/2021 09:26:40 29  2   27
12345   On Hold 25/06/2021 10:44:26 30  3   27
12345   Started 25/06/2021 12:07:03 31  16  15
12345   Started 25/06/2021 12:46:42 32  17  15
12345   Started 01/07/2021 10:35:12 33  18  15
12345   Started 06/07/2021 12:45:04 34  19  15
12345   Started 07/07/2021 08:48:12 35  20  15
12345   Started 07/07/2021 09:20:02 36  21  15
12345   Complete Details    07/07/2021 09:33:14 37  13  24
12345   Complete Details    07/07/2021 09:33:52 38  14  24
12345   Proceed 07/07/2021 09:42:04 39  1   38
12345   Proceed 07/07/2021 11:12:26 40  2   38
12345   Proceed 07/07/2021 11:49:54 41  3   38
12345   Delivery    07/07/2021 11:56:51 42  1   41
12345   Delivery    08/07/2021 08:28:19 43  2   41

What I'm trying to achieve is treat a series of the "MyStatus" entries as a single group, so I can track "MyID" in each "MyStatus" before it moves to a different "MyStatus".

Some months ago I saw something similar online and they recommended using a combination of Row_Number and Dense_Rank to create a "GroupOrder" which is suppose to be unique for each series of "MyStatus" records. This has mostly worked when I tested various sample sets... however, the set above has an issue:

- Rows 27 and 28 are assigned a "GroupOder" number of 15... but Rrows 32-37 are also being assigned a "GroupOrder" number of 15

What am I doing wrong? Is there are way to ensure the "GroupOrder" value is unique for each series of "MyStatus" records?

Thanks in advance

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,536 questions
0 comments No comments
{count} votes

Accepted answer
  1. Viorel 110.7K Reputation points
    2022-04-01T10:02:41.387+00:00

    Check if this query gives unique values:

    select 
        MyID,
        MyStatus,
        StartTime,
        row_number() over (partition by MyID order by StartTime) as RowNum,
        dense_rank() over (partition by MyID order by MyStatus, StartTime) as DenseRank,
        row_number() over (partition by MyID order by StartTime) - dense_rank() over (partition by MyID order by MyStatus, StartTime) as GroupOrder
    from #GroupingTemp
    order by MyID, StartTime
    
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. LiHong-MSFT 10,041 Reputation points
    2022-04-04T04:16:55.877+00:00

    Hi @Scott Kent-Collins
    In this sitution ,you need to change DENSE_RANK() OVER (PARTITION BY MyID, MyStatus ORDER BY MyID, StartTime) to DENSE_RANK() OVER (PARTITION BY MyID ORDER BY MyStatus , StartTime)
    Also,you can refer to below method which I usually used to set unique value for each series of "MyStatus" records.

    SELECT MyID,MyStatus,StartTime,SUM(TO_SUM)OVER(ORDER BY MyID, StartTime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)AS GroupOrder  
    FROM(SELECT MyID,MyStatus,StartTime,CASE WHEN LAG(MyStatus,1,1)OVER(PARTITION BY MyID ORDER BY MyID, StartTime)<>MyStatus THEN 1 ELSE 0 END AS TO_SUM  
         FROM #GroupingTemp)T  
    ORDER BY StartTime  
    

    Best regards,
    LiHong


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments