question

NandanHegde-7720 avatar image
0 Votes"
NandanHegde-7720 asked NandanHegde-7720 edited

Handling missing PK-FK relations in historical data analysis

Hello All,
I have a scenario wherein we have 2 tables with below format

125824-image.png

125854-image.png


Since there is no PK-FK relationship between the 2 tables, finding the state as of that period becomes tedious.

My requirement is as below:

Need to charge data from table A for those IDs whose Status=Z during that time period.

So 2nd row ( with Id=1 and Time=31st July ) should be filtered out as during that time, the status of that ID=A.

So what is the best SQL method to achieve this scenario?


Sample Data:

 Create table #Test
 (
 TRTime datetime,
 ID Int,
 Charge int
 )
    
 insert into #Test values('2021-06-30',1,5)
 insert into #Test values('2021-07-15',1,5)
 insert into #Test values('2021-07-31',1,6)
 insert into #Test values('2021-07-15',2,8)
 insert into #Test values('2021-07-31',2,6)
    
    
 create table #Status
 (
 Id int,
 Status varchar(5),
 RowStartDt datetime,
 RowEndDt datetime,
 RowIsCurrent int
 )
 insert into #Status values(1,'F','2021-01-01','2021-07-14',0)
 insert into #Status values(1,'P','2021-07-14','2021-07-15',0)
 insert into #Status values(1,'Z','2021-07-15','2021-07-16',0)
 insert into #Status values(1,'A','2021-07-16',Null,1)
 insert into #Status values(2,'Z','2021-01-01',Null,1)


My Query :

Select * from #Test T
Left join #Status s on t.id=s.id and (T.TRTime should match to the nearest time for that ID in RowStartDtColumn)
where Status='Z'

Needed output :

125858-image.png

So we need to filter out records from Test table when for the corresponding time period the status of that ID is 'Z' in Status table.



sql-server-generalsql-server-transact-sql
image.png (7.3 KiB)
image.png (11.6 KiB)
image.png (5.9 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

EchoLiu-msft avatar image
0 Votes"
EchoLiu-msft answered NandanHegde-7720 commented

Please check:

  ;WITH cte
  as(SELECT ID, Status,RowStartDt,CASE WHEN RowEndDt IS NULL
  THEN '9999-12-31 23:59:59.997'
  ELSE RowEndDt END RowEndDt ,
  RowIsCurrent
  FROM #Status)
    
  SELECT t.* FROM #Test t
  LEFT JOIN cte c
  ON t.TRTime BETWEEN c.RowStartDt AND c.RowEndDt AND t.id=c.id
  WHERE Status='Z'

Or:

 ;WITH cte
 as(Select t.*,s.Status,s.RowStartDt,ABS(DATEDIFF(day,t.TRTime,s.RowStartDt)) diff
 from #Test t
 Left join #Status s on t.id=s.id)
 ,cte2 as(SELECT *,ROW_NUMBER() OVER(PARTITION BY id,TRTime ORDER BY diff) rr
 FROM cte)
    
 SELECT TRTime,ID,Charge FROM cte2
 WHERE rr=1 AND Status='Z'

Output:
125928-image.png


If you have any question, please feel free to let me know.


Regards
Echo


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.




image.png (4.1 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hey @EchoLiu-msft ,

Thank you for your reply :)
So that's what I was also thinking of using 'Between' function.

But what is there is a below scenario :

Please insert this additional row into the table :
insert into #Status values(1,'Z','2021-07-15','2021-07-15',0)

wherein there are 2 rows of same ID with same RowstartDt as well.
So in that scenario, there would be duplicate joins. So should we use partition as well?

0 Votes 0 ·
pituach avatar image
0 Votes"
pituach answered pituach commented

Good day @NandanHegde-7720

You can use the same query using JOIN as you would use if there were primary key and foreign key. The main reason to use primary and foreign keys is to enforce data consistency (PK enforces uniqueness of values, FK enforces consistency of data so it will be exists in a table which we point to).

Note! If you need help in the query itself then please provide queries to create the tables and insert the sample data + explain what is the expected result. Images are fine for visual explanation but if you provide the queries to create the tables then we can create the tables in our server and provide a solution which we tested

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hey @pituach ,
I have updated the query with sample data and output result .
The ask is w.r.t making this statement in join condition :
((T.TRTime should match to the nearest time for that ID in RowStartDtColumn)

So what is the best way to scan for the rows in between the time periods in JOIN conditions.
Should we use Between?

1 Vote 1 ·

Well done on providing the missing information.

I noticed it only now and I see that you already got answer and marked it. Glad to see your issue was solve, and welcome to the forum

1 Vote 1 ·
PriyaJha-3992 avatar image
0 Votes"
PriyaJha-3992 answered NandanHegde-7720 edited

Hi @pituach , @EchoLiu-msft

We have a similar scenario with an additional case, wherein to the above data, please add two additional row

 insert into #Status values(1,'Z','2021-07-15','2021-07-15',0)
 insert into #Status values(1,'X','2021-07-15','2021-07-16',0)

Now for the same id there are 3 rows in the #Status table, where the date '2021-07-15' would be between RowStartDt and RowEndDt

How to handle this duplicate scenario in the below query which leverages between clause:

 WITH cte as(SELECT ID, Status,RowStartDt,CASE WHEN RowEndDt IS NULL THEN '9999-12-31 23:59:59.997' ELSE RowEndDt END RowEndDt ,RowIsCurrent FROM #Status)SELECT t.* FROM #Test t LEFT JOIN cte c ON t.TRTime BETWEEN c.RowStartDt AND c.RowEndDt AND t.id=c.idWHERE Status='Z'

Note: We have an additional audit column of type Identity, so in the above scenario we need to take 1st instance of the row


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

The below query should work

  Select * from 
  (
  Select *,ROW_NUMBER() OVER(PARTITION BY id,TRTime ORDER BY diff) rr  from 
  (Select t.*,s.Status,s.RowStartDt,ABS(DATEDIFF(day,t.TRTime,s.RowStartDt)) diff
  from #Test t
  Left join #Status s on t.id=s.id and t.TRTime BETWEEN RowStartDt AND ISNULL(RowEndDt,'9999-12-31'))  a) b
    
  where rr=1 and status='z'


0 Votes 0 ·