question

ScottKentCollins-2526 avatar image
0 Votes"
ScottKentCollins-2526 asked LiHongMSFT-3908 answered

Uniquely grouping with Row_Number and Dense_Rank issue

The following query:

 create table #GroupingTemp
 (
     MyID int, 
     MyStatus Varchar(50), 
     StartTime DateTime, 
     EndDate DatetIme 
 )
    
 INSERT INTO #GroupingTemp (MyID, MyStatus, StartTime, EndDate)
        
 VALUES
 (12345, 'Started', '2021-01-22 11:42:09.263', '2021-01-25 09:04:05.010'),
 (12345, 'Started', '2021-01-25 09:04:05.010', '2021-01-25 09:04:44.747'),
 (12345, 'Started', '2021-01-25 09:04:44.747', '2021-02-09 11:49:22.967'),
 (12345, 'Complete Details', '2021-02-09 11:49:22.967', '2021-02-10 10:02:38.360'),
 (12345, 'Complete Details', '2021-02-10 10:02:38.360', '2021-02-10 10:27:45.443'),
 (12345, 'Started', '2021-02-10 10:27:45.443', '2021-02-10 10:28:10.270'),
 (12345, 'Started', '2021-02-10 10:28:10.270', '2021-02-12 11:29:52.197'),
 (12345, 'Started', '2021-02-12 11:29:52.197', '2021-02-12 14:03:48.243'),
 (12345, 'Complete Details', '2021-02-12 14:03:48.243', '2021-02-12 14:10:55.933'),
 (12345, 'Complete Details', '2021-02-12 14:10:55.933', '2021-02-15 09:16:29.357'),
 (12345, 'Started', '2021-02-15 09:16:29.357', '2021-02-15 09:18:08.340'),
 (12345, 'Started', '2021-02-15 09:18:08.340', '2021-02-15 14:11:00.007'),
 (12345, 'Started', '2021-02-15 14:11:00.007', '2021-02-22 08:47:36.570'),
 (12345, 'Complete Details', '2021-02-22 08:47:36.570', '2021-02-22 08:56:14.997'),
 (12345, 'Complete Details', '2021-02-22 08:56:14.997', '2021-02-22 08:56:32.897'),
 (12345, 'Complete Details', '2021-02-22 08:56:32.897', '2021-02-22 09:08:51.117'),
 (12345, 'Complete Details', '2021-02-22 09:08:51.117', '2021-02-22 09:41:50.297'),
 (12345, 'Started', '2021-02-22 09:41:50.297', '2021-02-22 09:42:34.573'),
 (12345, 'Started', '2021-02-22 09:42:34.573', '2021-02-25 10:36:28.530'),
 (12345, 'Started', '2021-02-25 10:36:28.530', '2021-02-26 13:55:22.577'),
 (12345, 'Complete Details', '2021-02-26 13:55:22.577', '2021-02-26 13:56:09.527'),
 (12345, 'Complete Details', '2021-02-26 13:56:09.527', '2021-03-01 09:18:33.723'),
 (12345, 'Started', '2021-03-01 09:18:33.723', '2021-03-01 09:19:07.363'),
 (12345, 'Started', '2021-03-01 09:19:07.363', '2021-03-04 10:44:51.393'),
 (12345, 'Started', '2021-03-04 10:44:51.393', '2021-03-09 15:23:01.263'),
 (12345, 'Complete Details', '2021-03-09 15:23:01.263', '2021-03-09 15:29:18.090'),
 (12345, 'Complete Details', '2021-03-09 15:29:18.090', '2021-03-10 08:19:55.360'),
 (12345, 'On Hold', '2021-03-10 08:19:55.360', '2021-03-12 09:26:39.860'),
 (12345, 'On Hold', '2021-03-12 09:26:39.860', '2021-06-25 10:44:25.910'),
 (12345, 'On Hold', '2021-06-25 10:44:25.910', '2021-06-25 12:07:03.347'),
 (12345, 'Started', '2021-06-25 12:07:03.347', '2021-06-25 12:46:41.557'),
 (12345, 'Started', '2021-06-25 12:46:41.557', '2021-07-01 10:35:12.447'),
 (12345, 'Started', '2021-07-01 10:35:12.447', '2021-07-06 12:45:03.613'),
 (12345, 'Started', '2021-07-06 12:45:03.613', '2021-07-07 08:48:12.387'),
 (12345, 'Started', '2021-07-07 08:48:12.387', '2021-07-07 09:20:01.850'),
 (12345, 'Started', '2021-07-07 09:20:01.850', '2021-07-07 09:33:14.467'),
 (12345, 'Complete Details', '2021-07-07 09:33:14.467', '2021-07-07 09:33:52.493'),
 (12345, 'Complete Details', '2021-07-07 09:33:52.493', '2021-07-07 09:42:03.823'),
 (12345, 'Proceed', '2021-07-07 09:42:03.823', '2021-07-07 11:12:26.103'),
 (12345, 'Proceed', '2021-07-07 11:12:26.103', '2021-07-07 11:49:53.833'),
 (12345, 'Proceed', '2021-07-07 11:49:53.833', '2021-07-07 11:56:50.940'),
 (12345, 'Delivery', '2021-07-07 11:56:50.940', '2021-07-08 08:28:19.423'),
 (12345, 'Delivery', '2021-07-08 08:28:19.423', '2021-07-08 08:36:09.220');
    
    
 SELECT 
 MyID,
 MyStatus,
 StartTime,
 ROW_NUMBER() OVER (ORDER BY MyID, StartTime) as RowNum,
 DENSE_RANK() OVER (PARTITION BY MyID, MyStatus ORDER BY MyID, StartTime) as DenseRank,
 ROW_NUMBER() OVER (ORDER BY MyID, StartTime) - DENSE_RANK() OVER (PARTITION BY MyID, MyStatus ORDER BY MyID, StartTime) as GroupOrder
    
 FROM 
 #GroupingTemp
    
 ORDER BY StartTime

Which generates the following dataset:

 MyID    MyStatus    StartTime    RowNum    DenseRank    GroupOrder
 12345    Started    22/01/2021 11:42:09    1    1    0
 12345    Started    25/01/2021 09:04:05    2    2    0
 12345    Started    25/01/2021 09:04:45    3    3    0
 12345    Complete Details    09/02/2021 11:49:23    4    1    3
 12345    Complete Details    10/02/2021 10:02:38    5    2    3
 12345    Started    10/02/2021 10:27:45    6    4    2
 12345    Started    10/02/2021 10:28:10    7    5    2
 12345    Started    12/02/2021 11:29:52    8    6    2
 12345    Complete Details    12/02/2021 14:03:48    9    3    6
 12345    Complete Details    12/02/2021 14:10:56    10    4    6
 12345    Started    15/02/2021 09:16:29    11    7    4
 12345    Started    15/02/2021 09:18:08    12    8    4
 12345    Started    15/02/2021 14:11:00    13    9    4
 12345    Complete Details    22/02/2021 08:47:37    14    5    9
 12345    Complete Details    22/02/2021 08:56:15    15    6    9
 12345    Complete Details    22/02/2021 08:56:33    16    7    9
 12345    Complete Details    22/02/2021 09:08:51    17    8    9
 12345    Started    22/02/2021 09:41:50    18    10    8
 12345    Started    22/02/2021 09:42:35    19    11    8
 12345    Started    25/02/2021 10:36:29    20    12    8
 12345    Complete Details    26/02/2021 13:55:23    21    9    12
 12345    Complete Details    26/02/2021 13:56:10    22    10    12
 12345    Started    01/03/2021 09:18:34    23    13    10
 12345    Started    01/03/2021 09:19:07    24    14    10
 12345    Started    04/03/2021 10:44:51    25    15    10
 12345    Complete Details    09/03/2021 15:23:01    26    11    15
 12345    Complete Details    09/03/2021 15:29:18    27    12    15
 12345    On Hold    10/03/2021 08:19:55    28    1    27
 12345    On Hold    12/03/2021 09:26:40    29    2    27
 12345    On Hold    25/06/2021 10:44:26    30    3    27
 12345    Started    25/06/2021 12:07:03    31    16    15
 12345    Started    25/06/2021 12:46:42    32    17    15
 12345    Started    01/07/2021 10:35:12    33    18    15
 12345    Started    06/07/2021 12:45:04    34    19    15
 12345    Started    07/07/2021 08:48:12    35    20    15
 12345    Started    07/07/2021 09:20:02    36    21    15
 12345    Complete Details    07/07/2021 09:33:14    37    13    24
 12345    Complete Details    07/07/2021 09:33:52    38    14    24
 12345    Proceed    07/07/2021 09:42:04    39    1    38
 12345    Proceed    07/07/2021 11:12:26    40    2    38
 12345    Proceed    07/07/2021 11:49:54    41    3    38
 12345    Delivery    07/07/2021 11:56:51    42    1    41
 12345    Delivery    08/07/2021 08:28:19    43    2    41

What I'm trying to achieve is treat a series of the "MyStatus" entries as a single group, so I can track "MyID" in each "MyStatus" before it moves to a different "MyStatus".

Some months ago I saw something similar online and they recommended using a combination of Row_Number and Dense_Rank to create a "GroupOrder" which is suppose to be unique for each series of "MyStatus" records. This has mostly worked when I tested various sample sets... however, the set above has an issue:

- Rows 27 and 28 are assigned a "GroupOder" number of 15... but Rrows 32-37 are also being assigned a "GroupOrder" number of 15

What am I doing wrong? Is there are way to ensure the "GroupOrder" value is unique for each series of "MyStatus" records?

Thanks in advance

sql-server-transact-sql
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Viorel-1 avatar image
0 Votes"
Viorel-1 answered Viorel-1 rolled back

Check if this query gives unique values:

 select 
     MyID,
     MyStatus,
     StartTime,
     row_number() over (partition by MyID order by StartTime) as RowNum,
     dense_rank() over (partition by MyID order by MyStatus, StartTime) as DenseRank,
     row_number() over (partition by MyID order by StartTime) - dense_rank() over (partition by MyID order by MyStatus, StartTime) as GroupOrder
 from #GroupingTemp
 order by MyID, StartTime

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

LiHongMSFT-3908 avatar image
0 Votes"
LiHongMSFT-3908 answered

Hi @ScottKentCollins-2526
In this sitution ,you need to change DENSE_RANK() OVER (PARTITION BY MyID, MyStatus ORDER BY MyID, StartTime) to DENSE_RANK() OVER (PARTITION BY MyID ORDER BY MyStatus , StartTime)
Also,you can refer to below method which I usually used to set unique value for each series of "MyStatus" records.

 SELECT MyID,MyStatus,StartTime,SUM(TO_SUM)OVER(ORDER BY MyID, StartTime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)AS GroupOrder
 FROM(SELECT MyID,MyStatus,StartTime,CASE WHEN LAG(MyStatus,1,1)OVER(PARTITION BY MyID ORDER BY MyID, StartTime)<>MyStatus THEN 1 ELSE 0 END AS TO_SUM
      FROM #GroupingTemp)T
 ORDER BY StartTime

Best regards,
LiHong


If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.