question

vsslasd-3371 avatar image
0 Votes"
vsslasd-3371 asked HowardPaggerton-1455 answered

Prep Data formula for a Bell Curve

I have some data such as this:

 Declare @D table (ID varchar(7), ProjDate Date, ProjAmt numeric(19,5))
    
 Insert Into @D(ID, ProjDate, ProjAmt)
    
 VALUES
    
 ('C2147',  '2021-01-01', 6.60869565217391),
 ('C2147',  '2021-01-04', 6.60869565217391),
 ('C2147',  '2021-01-11', 6.60869565217391),
 ('C2147',  '2021-01-18', 6.60869565217391),
 ('C2147',  '2021-01-25', 6.60869565217391),
 ('C2147',  '2021-02-01', 6.60869565217391),
 ('C2147',  '2021-02-08', 6.60869565217391),
 ('C2147',  '2021-02-15', 6.60869565217391),
 ('C2147',  '2021-02-22', 6.60869565217391),
 ('C2147',  '2021-03-01', 6.60869565217391),
 ('C2147',  '2021-03-08', 6.60869565217391),
 ('C2147',  '2021-03-15', 6.60869565217391),
 ('C2147',  '2021-03-22', 6.60869565217391),
 ('C2147',  '2021-03-29', 6.60869565217391),
 ('C2147',  '2021-04-05', 6.60869565217391),
 ('C2147',  '2021-04-12', 6.60869565217391),
 ('C2147',  '2021-04-19', 6.60869565217391),
 ('C2147',  '2021-04-26', 6.60869565217391),
 ('C2147',  '2021-05-03', 6.60869565217391),
 ('C2147',  '2021-05-10', 6.60869565217391),
 ('C2147',  '2021-05-17', 6.60869565217391),
 ('C2147',  '2021-05-24', 6.60869565217391),
 ('C2147',  '2021-05-31', 6.60869565217391)
    
 Select * from @D


I'd like to prep this data for a bell curve, and not certain about the formula, the Bell Curve weight should peak at 75% of the way during the timeline, not the exact center.

This needs to be calculated based across the Id field, and the ProjAmt Field should start and end at zero, and have the predominate "bell" project amount around the 4/5/21 timeframe (the approximate 75% timeline factor).

What would be the formula in TSQL to convert this data appropriately ?

I'm trying to convert the flat line data noted above so that the bulk of the data appears 75% of the way into the timeline and the starting and ending points are zero, similar to the example here in this spreadsheet. I believe the formula would pertain to something similar to a Bell Curve formula the exact results aren't in green - it is the concept: weight most of the period amounts into the 75% portion of the timeline. How would I do this ?

67082-bell.png


sql-server-transact-sql
bell.png (8.9 KiB)
· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @vsslasd-3371,

Sorry, I don’t understand your needs very well. Could you please provide a visualized expected output?

Regards
Echo

0 Votes 0 ·

Hello EchoLiu

I'm trying to convert the flat line data noted above so that the bulk of the data appears 75% of the way into the timeline and the starting and ending points are zero, similar to the example here in this spreadsheet. I believe the formula would pertain to something similar to a Bell Curve formula the exact results aren't in green - it is the concept: weight most of the period amounts into the 75% portion of the timeline. How would I do this ?
67072-bell.png


67006-bell.png


0 Votes 0 ·
bell.png (8.9 KiB)

Hello EchoLiu

I'm trying to convert the flat line data noted above so that the bulk of the data appears 75% of the way into the timeline and the starting and ending points are zero, similar to the example here in this spreadsheet. I believe the formula would pertain to something similar to a Bell Curve formula the exact results aren't in green - it is the concept: weight most of the period amounts into the 75% portion of the timeline. How would I do this ?
67072-bell.png


67066-bell.png


0 Votes 0 ·
bell.png (8.9 KiB)

To fit the amount of something into a bell curve, you need a probability distribution of it over a further variable. I only see timeline data with a constant.

So please rephrase your question.

0 Votes 0 ·

Thank you, Stefan.
I appreciate your help. To answer your question, we want to peak the data Amount at 80% when the data timeline is 75% of the way through start to end.

To put this more in layperson's terms, We start a project on 2/1 but work doesn't actually start on this particular phase until we get to week period which is 75% into the project. We need to ramp up work gradually until that week and then we want 80% of the work to be performed within those few weeks which would constitute 30-35% of the project timeline.

0 Votes 0 ·
Viorel-1 avatar image
0 Votes"
Viorel-1 answered Viorel-1 published

In case of multiple groups, try this script:

 declare @D table (ID varchar(7), ProjDate Date, ProjAmt numeric(19,5))
        
 insert Into @D(ID, ProjDate, ProjAmt) values
 ('C2147',  '2021-01-01', 6.60869565217391),
 ('C2147',  '2021-01-04', 6.60869565217391),
 ('C2147',  '2021-01-11', 6.60869565217391),
 ('C2147',  '2021-01-18', 6.60869565217391),
 ('C2147',  '2021-01-25', 6.60869565217391),
 ('C2147',  '2021-02-01', 6.60869565217391),
 ('C2147',  '2021-02-08', 6.60869565217391),
 ('C2147',  '2021-02-15', 6.60869565217391),
 ('C2147',  '2021-02-22', 6.60869565217391),
 ('C2147',  '2021-03-01', 6.60869565217391),
 ('C2147',  '2021-03-08', 6.60869565217391),
 ('C2147',  '2021-03-15', 6.60869565217391),
 ('C2147',  '2021-03-22', 6.60869565217391),
 ('C2147',  '2021-03-29', 6.60869565217391),
 ('C2147',  '2021-04-05', 6.60869565217391),
 ('C2147',  '2021-04-12', 6.60869565217391),
 ('C2147',  '2021-04-19', 6.60869565217391),
 ('C2147',  '2021-04-26', 6.60869565217391),
 ('C2147',  '2021-05-03', 6.60869565217391),
 ('C2147',  '2021-05-10', 6.60869565217391),
 ('C2147',  '2021-05-17', 6.60869565217391),
 ('C2147',  '2021-05-24', 6.60869565217391),
 ('C2147',  '2021-05-31', 6.60869565217391),
    
 ('Z0000',  '2020-01-01', 9.8),
 ('Z0000',  '2020-01-04', 9.8),
 ('Z0000',  '2020-01-11', 9.8),
 ('Z0000',  '2020-01-18', 9.8),
 ('Z0000',  '2020-01-25', 9.8),
 ('Z0000',  '2020-02-01', 9.8),
 ('Z0000',  '2020-02-08', 9.8),
 ('Z0000',  '2020-02-15', 9.8),
 ('Z0000',  '2020-02-22', 9.8),
 ('Z0000',  '2020-03-01', 9.8),
 ('Z0000',  '2020-03-08', 9.8),
 ('Z0000',  '2020-03-15', 9.8),
 ('Z0000',  '2020-03-22', 9.8),
 ('Z0000',  '2020-03-29', 9.8),
 ('Z0000',  '2020-04-05', 9.8),
 ('Z0000',  '2020-04-12', 9.8),
 ('Z0000',  '2020-04-19', 9.8),
 ('Z0000',  '2020-04-26', 9.8),
 ('Z0000',  '2020-05-03', 9.8),
 ('Z0000',  '2020-05-10', 9.8),
 ('Z0000',  '2020-05-17', 9.8),
 ('Z0000',  '2020-05-24', 9.8),
 ('Z0000',  '2020-05-31', 9.8)
    
 ---    
    
 declare @mindate as date = (select min(ProjDate) from @d)
    
 ;
 with P as
 (
     select ID, min(ProjDate) as mindate, sum(ProjAmt) as sumamt
     from @D
     group by ID
 ),
 X as
 (
     select d.*, datediff(day, mindate, ProjDate) x
     from @D d 
     inner join P on P.ID = d.ID
 ),
 AC as
 (
     select ID, cast(min(x) as float) as a, cast(max(x) as float) as c
     from X
     group by ID
 ),
 B as
 (
     select distinct ID, cast(percentile_cont( 0.75 ) within group (order by x) over (partition by ID) as float) as b
     from X
 ),
 R as
 (
     select X.ID, X.ProjDate, X.ProjAmt,
         power(x - a, alpha - 1 ) * power(c - x, beta - 1) / 
                 (dbo.BetaApprox(alpha, beta) * power(c - a, alpha + beta - 1)) as val
     from X
     inner join AC on AC.ID = X.ID
     inner join B on B.ID = X.ID
     cross apply (values (1 + 4 * (b - a) / (c - a), 1 + 4 * (c - b) / (c - a))) t1(alpha, beta)
 ),
 S as
 (
     select ID, sum(val) as sumval
     from R
     group by ID
 ),
 F as
 (
     select R.ID, R.ProjDate, R.ProjAmt, val * sumamt / sumval as NewProjAmt, sumamt
     from R
     inner join P on P.ID = R.ID
     inner join S on S.ID = R.ID
 )
 select F.ID, F.ProjDate, F.ProjAmt as OldProjAmt, NewProjAmt, replicate( '*', NewProjAmt), sumamt as OldSum, sum(NewProjAmt) over (partition by ID) as NewSum
 from F
 order by ID, ProjDate

This experiment uses PERCENTILE_CONT. Maybe you can also use ‘b = a + (c - a) * 0.75’ instead.




5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Viorel-1 avatar image
0 Votes"
Viorel-1 answered Viorel-1 commented

Check some experimental scripts:

 declare @D table (ID varchar(7), ProjDate Date, ProjAmt numeric(19,5))
        
 insert Into @D(ID, ProjDate, ProjAmt) values
 ('C2147',  '2021-01-01', 6.60869565217391),
 ('C2147',  '2021-01-04', 6.60869565217391),
 ('C2147',  '2021-01-11', 6.60869565217391),
 ('C2147',  '2021-01-18', 6.60869565217391),
 ('C2147',  '2021-01-25', 6.60869565217391),
 ('C2147',  '2021-02-01', 6.60869565217391),
 ('C2147',  '2021-02-08', 6.60869565217391),
 ('C2147',  '2021-02-15', 6.60869565217391),
 ('C2147',  '2021-02-22', 6.60869565217391),
 ('C2147',  '2021-03-01', 6.60869565217391),
 ('C2147',  '2021-03-08', 6.60869565217391),
 ('C2147',  '2021-03-15', 6.60869565217391),
 ('C2147',  '2021-03-22', 6.60869565217391),
 ('C2147',  '2021-03-29', 6.60869565217391),
 ('C2147',  '2021-04-05', 6.60869565217391),
 ('C2147',  '2021-04-12', 6.60869565217391),
 ('C2147',  '2021-04-19', 6.60869565217391),
 ('C2147',  '2021-04-26', 6.60869565217391),
 ('C2147',  '2021-05-03', 6.60869565217391),
 ('C2147',  '2021-05-10', 6.60869565217391),
 ('C2147',  '2021-05-17', 6.60869565217391),
 ('C2147',  '2021-05-24', 6.60869565217391),
 ('C2147',  '2021-05-31', 6.60869565217391)
    
 ---    
    
 declare @mindate as date = (select min(ProjDate) from @d)
    
 ;
 with E1 as
 (
     select *
     from @D
     cross apply (values (datediff(day, @mindate, ProjDate))) t1(x)
 ),
 E2 as
 (
     select cast(min(x) as float) as a, cast(max(x) as float) as c
     from E1
 ),
 E3 as
 (
     select distinct E2.*, cast(percentile_disc( 0.75 ) within group (order by x) over () as float) as b
     from E1, E2
 ),
 E4 as
 (
     select *,
         power(x - a, alpha - 1 ) * power(c - x, beta - 1) / 
             (dbo.BetaApprox(alpha, beta) * power(c - a, alpha + beta - 1)) as val
     from E1, E3
     cross apply (values (1 + 4 * (b - a) / (c - a), 1 + 4 * (c - b) / (c - a))) t1(alpha, beta)
 )
 select ID, ProjDate, val * 100 / max(val) over (partition by null) as ProjAmt, replicate('*', val * 1000)
 from E4
 order by ProjDate

Also add a helper function:

 create or alter function dbo.BetaApprox( @x float, @y float ) 
 returns float
 as begin
     return sqrt(2 * pi()) * power(@x, @x - 0.5) * power(@y, @y - 0.5) / 
         power(@x + @y, @x + @y - 0.5)
 end

It generates a distribution using values between 0 and 100 that looks like a Bell Curve. The last column illustrates it.


 /*
 ID      ProjDate   ProjAmt                
 -----------------------------------------------------------------------------
 C2147   2021-01-01 0                      
 C2147   2021-01-04 0,00531370950663901    
 C2147   2021-01-11 0,203758600080973      
 C2147   2021-01-18 0,988632019136461      
 C2147   2021-01-25 2,70638824348182       
 C2147   2021-02-01 5,62448556581808       
 C2147   2021-02-08 9,92360872580418       *
 C2147   2021-02-15 15,6924770761584       **
 C2147   2021-02-22 22,9236818630141       ***
 C2147   2021-03-01 31,5099647059236       ****
 C2147   2021-03-08 41,2406198529949       ******
 C2147   2021-03-15 51,7977898587582       *******
 C2147   2021-03-22 62,7524344869001       *********
 C2147   2021-03-29 73,5597089766708       ***********
 C2147   2021-04-05 83,5533802415433       ************
 C2147   2021-04-12 91,9386951773832       *************
 C2147   2021-04-19 97,7826874920967       **************
 C2147   2021-04-26 100                    ***************
 C2147   2021-05-03 97,330157206678        **************
 C2147   2021-05-10 88,2963805916995       *************
 C2147   2021-05-17 71,1162094744645       **********
 C2147   2021-05-24 43,4361976275036       ******
 C2147   2021-05-31 0                      
 */


· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you, I think this is close....

A couple of questions first please:
I see where the 75% marker for the timeline can be dialed up or down in your code. But what part of your code pertains to the density of the bell curve (how steep or gradual) it is ?

Also: this works great for one Id where Id=C2147, but I have many different similar Ids, and need this curve to execute over each Id in the same fashion based upon it's own start/end date and amount. Where in the code would we account for this ? Partition ?

There is an issue with the total: The Sum of the Amount column must be equal to that which is in the table: 152
Which is 22* 6.60869565217391=152

Can you please advise ?
Thank you so very much !

0 Votes 0 ·

The script that processes multiple groups of data will be sent separately.

Regarding the density, I am not sure I can answer the question. Probably you should consider more approaches. Maybe R language, available in SQL, can be used too.

0 Votes 0 ·
vsslasd-3371 avatar image
0 Votes"
vsslasd-3371 answered

Thank you - the totals all add up which is great...

I will open another question on this topic , which pertains to the density or steepness of the Bell curve - so that can be adjusted too.

Thank you so very much !

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

HowardPaggerton-1455 avatar image
2 Votes"
HowardPaggerton-1455 answered

I just wanted to add a minor amendment to this part of code (works well - thanks!):

  declare @mindate as date = (select min(ProjDate) from @d)
        
  ;
  with E1 as
  (
      select *
      from @D
      cross apply (values (datediff(day, @mindate, ProjDate))) t1(x)
  ),
  E2 as
  (
      select cast(min(x) as float) as a, cast(max(x) as float) as c
      from E1
  ),
  E3 as
  (
      select distinct E2.*, cast(percentile_disc( 0.75 ) within group (order by x) over () as float) as b
      from E1, E2
  ),
  E4 as
  (
      select *,
          power(x - a, alpha - 1 ) * power(c - x, beta - 1) / 
              (dbo.BetaApprox(alpha, beta) * power(c - a, alpha + beta - 1)) as val
      from E1, E3
      cross apply (values (1 + 4 * (b - a) / (c - a), 1 + 4 * (c - b) / (c - a))) t1(alpha, beta)
  )
  select ID, ProjDate, val * 100 / max(val) over (partition by null) as ProjAmt, replicate('*', val * 1000)
  from E4
  order by ProjDate

BTW it's a bit strange how it coincides with some numpy sqrt errors in Python...


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.