question

SudipBhatt-9737 avatar image
0 Votes"
SudipBhatt-9737 asked ErlandSommarskog commented

How to calculate Median Value

Suppose i have table and one column has numeric value. now how could i calculate median value ?

i found one code from this url https://stackoverflow.com/questions/1342898/function-to-calculate-median-in-sql-server
but their approach is not clear

 DECLARE @table AS TABLE
 (
     Number int not null
 );
    
 insert into @table select 2;
 insert into @table select 4;
 insert into @table select 9;
 insert into @table select 15;
 insert into @table select 22;
 insert into @table select 26;
 insert into @table select 37;
 insert into @table select 49;
    
 DECLARE @Count AS INT
 SELECT @Count = COUNT(*) FROM @table;
    
 WITH MyResults(RowNo, Number) AS
 (
     SELECT RowNo, Number FROM
         (SELECT ROW_NUMBER() OVER (ORDER BY Number) AS RowNo, Number FROM @table) AS Foo
 )
 SELECT AVG(Number) FROM MyResults WHERE RowNo = (@Count+1)/2 OR RowNo = ((@Count+1)%2) * ((@Count+2)/2)

1) RowNo = (@Count+1)/2 not clear
2) RowNo = ((@Count+1)%2) * ((@Count+2)/2) not clear

what they are doing?

any easy approach exist for median calculation? thanks



sql-server-transact-sql
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.


In the solutions with integer numbers that use AVG, I think that you should add a conversion like SELECT AVG(cast(Number as float)) instead of SELECT AVG(Number).


1 Vote 1 ·
MelissaMa-msft avatar image
0 Votes"
MelissaMa-msft answered Viorel-1 edited

Hi @SudipBhatt-9737,

Actually Erland has provided all the details.

Only adding some notes here.

this line not clear WHERE ROWASC IN ( ROWDESC, ROWDESC - 1, ROWDESC + 1 )

If it contains an odd number of values, it will refer 'where ROWASC =ROWDESC'.

48136-2.png

If it contains an even number of values, it will refer 'where ROWASC =ROWDESC+1 or ROWASC =ROWDESC-1 '.

48124-1.png
So this where condition will include both situations.

PERCENTILE_CONT(0.5) why .5 is sending ? why not different value ?

Syntax:
PERCENTILE_CONT ( numeric_literal )
WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] )
OVER ( [ <partition_by_clause> ] )

You could send any decimal between 0.0 and 1.0.

In your situation, PERCENTILE_COUNT(0.5) is similar to finding median.

You could use below query to find out the differences.

 CREATE TABLE MEDIAN
   (
       Number int not null
   );
            
   insert into MEDIAN select 2;
   insert into MEDIAN select 4;
   insert into MEDIAN select 9;
   insert into MEDIAN select 15;
   insert into MEDIAN select 22;
   insert into MEDIAN select 26;
   insert into MEDIAN select 37;
   insert into MEDIAN select 49;
   insert into MEDIAN select 66;

 SELECT Number,
 PERCENTILE_CONT(0) WITHIN GROUP (ORDER BY Number) OVER () AS MINCOUNT,
 PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY Number) OVER () AS QUARTTERCOUNT,
 PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Number) OVER () AS MEDIANCONT ,
 PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY Number) OVER () AS THREEQUARTTERCOUNT ,
 PERCENTILE_CONT(1) WITHIN GROUP (ORDER BY Number) OVER () AS MAXCONT
 FROM   MEDIAN 

what WITHIN GROUP() does ?

This specifies a list of numeric values to sort and compute the percentile over. Only one order_by_expression is allowed.

why Over() is blank ?

It divides the result set produced by the FROM clause into partitions to which the percentile function is applied.

In your case, there is only one column number, then you could leave it as blank.

If you have another column, for example Groupid, then you need to put 'PARTITION BY Groupid' inside OVER().

Please refer below example:

 DROP TABLE IF EXISTS MEDIAN
    
  CREATE TABLE MEDIAN
   (
       Groupid int not null,
       Number int not null
   );
            
   insert into MEDIAN select 1,2;
   insert into MEDIAN select 1,4;
   insert into MEDIAN select 1,9;
   insert into MEDIAN select 1,15;
   insert into MEDIAN select 2,22;
   insert into MEDIAN select 2,26;
   insert into MEDIAN select 2,37;
   insert into MEDIAN select 2,49;
   insert into MEDIAN select 2,66;
    
 SELECT Number,
 PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Number) OVER (PARTITION BY Groupid) AS MEDIANCONT 
 FROM   MEDIAN 

Best regards
Melissa


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Hot issues November--What can I do if my transaction log is full?
Hot issues November--How to convert Profiler trace into a SQL Server table


2.png (3.9 KiB)
1.png (3.6 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.


In addition, choosing solutions that use ROWASC and ROWDESC, make sure that the sequence is not like this:

 CREATE TABLE MEDIAN
 (
     Number int not null
 );
          
 insert into MEDIAN select 1;
 insert into MEDIAN select 1;
 insert into MEDIAN select 1;
 insert into MEDIAN select 2;
 insert into MEDIAN select 3;

where the method does not seem to work without adjustments.



0 Votes 0 ·
ErlandSommarskog avatar image
1 Vote"
ErlandSommarskog answered SudipBhatt-9737 commented

The median is the middle value of a range. If the number is even, you may want the average of the range. If the values are 1, 2, 7, 8, 1000 the median is 7. If the values are 1, 2, 7, 8, 980, 1000, the median is said to be 7.5.

The solution you post numbers the rows, and then they select the one or two numbers in the middle and take the average of them.

However, since SQL 2012 there is a more direct way of computing the median, but strangely still not as a true aggregate function_

  DECLARE @table AS TABLE
  (
      Number int not null
  );
        
  insert into @table select 2;
  insert into @table select 4;
  insert into @table select 9;
  insert into @table select 15;
  insert into @table select 22;
  insert into @table select 26;
  insert into @table select 37;
  insert into @table select 49;
    
  SELECT DISTINCT 
         percentile_cont(0.5) WITHIN GROUP(ORDER BY Number) OVER(),
         percentile_disc(0.5) WITHIN GROUP(ORDER BY Number) OVER()
  FROM   @table

The two function serve the same purpose, but the result is different when there is a even number of values. _cont (for continuous) gives the average of the two middle values, whereas _disc (for discrete) gives you the lowest of the two values.

  1. says that you want the median, but you can say 0.1 to get deciles etc.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

These two line not clear

percentile_cont(0.5) WITHIN GROUP(ORDER BY Number) OVER(),
percentile_disc(0.5) WITHIN GROUP(ORDER BY Number) OVER()


what is PERCENTILE_CONT(0.5) does ?
what is percentile_disc(0.5) does ?

what is the usage of WITHIN GROUP ? i saw WITHIN GROUP used with string_aggerate function.

why Over() function is empty ? Over normally used with Row_Number function where we mention order by clause.

Sir please explain my above points. Thanks you.

0 Votes 0 ·
MelissaMa-msft avatar image
1 Vote"
MelissaMa-msft answered SudipBhatt-9737 commented

Hi @SudipBhatt-9737,

In SQL calculation, '/' means divide the left operand by the right operand. For example: 23/10=2 and 9/2=4. '%' means divide the left operand by the right operand and return the remainder. For example: 5%2=1 and 9%2=1.

If the dataset contains an odd number of values, than the middle value of the entire dataset will be considered as a median.
Median (M) = value of ((X + 1)/2) th item. (x is the number of values in the dataset)

For example, if we calculate the median of the dataset {1,2,3,4,6,7,8} by using the above formula, then Median (M)= (7+1)/2=4. So, the fourth value of the specified dataset, which is 4 in this case, is the median.

If the dataset contains an even number of rows, we must determine the two middle values of this dataset, add them together, and divide the result by two.
Median (M) = value of [(x/2) th item + (x/2 + 1) th item]/2 (x is the number of values in the dataset)

For example, if we apply this formula to the dataset {1,2,4,6,8,10}, then the median value is calculated as shown below:
Median (M)= [ 6/2 ] = 3rd value of the dataset + [ 6/2 + 1 ]= 4th value of the dataset.= (4+6)/2 = 5. So, the median value in this case is 5.

1) RowNo = (@Count+1)/2 not clear
2) RowNo = ((@Count+1)%2) * ((@Count+2)/2) not clear

So if there is an odd number of values, (@Count+1)/2 could be the correct value which is actually the middle one.

If there is an even number of values, we need to have the two middle values,(@Count+1)/2 is the smaller one, ((@Count+1)%2) * ((@Count+2)/2) is the other bigger one.

any easy approach exist for median calculation?

We could have another method using Row_number() or rank() as below:

 CREATE TABLE MEDIAN
  (
      Number int not null
  );
        
  insert into MEDIAN select 2;
  insert into MEDIAN select 4;
  insert into MEDIAN select 9;
  insert into MEDIAN select 15;
  insert into MEDIAN select 22;
  insert into MEDIAN select 26;
  insert into MEDIAN select 37;
  insert into MEDIAN select 49;
    
 SELECT AVG(number) AS MEDIAN 
 FROM   (SELECT number, 
                ROW_NUMBER() 
                  OVER ( 
                    ORDER BY number ASC) AS ROWASC, 
                ROW_NUMBER() 
                  OVER ( 
                    ORDER BY number DESC) AS ROWDESC 
         FROM   MEDIAN ) X 
 WHERE  ROWASC IN ( ROWDESC, ROWDESC - 1, ROWDESC + 1 ) 

As mentioned by Erland, you could also use the PERCENTILE_CONT function (SQL Server 2012 and later versions). It is capable of calculating the median within a partitioned set. It calculates the median when we pass 0.5 as an argument and specify the order within that dataset.

 SELECT Number,
        PERCENTILE_CONT(0.5) 
          WITHIN GROUP (ORDER BY Number) OVER () AS MEDIANCONT 
 FROM   MEDIAN 

In addition, you could also create a user-defined Function to calculate the median value of the specified dataset.


Best regards
Melissa


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Hot issues November--What can I do if my transaction log is full?
Hot issues November--How to convert Profiler trace into a SQL Server table

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you Madam but

1) this line not clear WHERE ROWASC IN ( ROWDESC, ROWDESC - 1, ROWDESC + 1 ) what this line is doing?

2) PERCENTILE_CONT(0.5) why .5 is sending ? why not different value ?

3) WITHIN GROUP (ORDER BY Number) OVER ()

what WITHIN GROUP does ?
why Over is blank ?

please help me to understand above points. thanks




0 Votes 0 ·
ErlandSommarskog avatar image
1 Vote"
ErlandSommarskog answered ErlandSommarskog commented

1) this line not clear WHERE ROWASC IN ( ROWDESC, ROWDESC - 1, ROWDESC + 1 ) what this line is doing?

That is the same as

 ROWASC = ROWDESC OR ROWASC = ROWDESC -1 OR ROWASC = ROWDESC + 1

ROWASC are the rows numbered in ascending order, ROWDESC numbers them in descending order. Where they meet, you find the median. If there is an odd number of values, there is one value where ROWDESC = ROWASC, but no rows where the other two conditions are true. The opposite applies when there is an even number of values.

2) PERCENTILE_CONT(0.5) why .5 is sending ? why not different value ?

Because you asked for the median. I tried to briefly explain this in my post, but it appears that there were some formatting incident. You could pass 0.1 for the decile (that is, the 10% percentile) or something else.

3) WITHIN GROUP (ORDER BY Number) OVER ()

what WITHIN GROUP does ?
why Over is blank ?

Over is blank, because there was only one column in your table, so there was nothing to partition on. But say that you have a table with lots of rows per customer, and you want the median of some value per customer. Then you would say OVER(PARTITION BY CustomerID).

WITHIN GROUP then specifies which value you are seeking the percentile of. That is, order the group by this value.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you for this.
Is there a way to round the generated float value by percentile_disc to 4 Dec places?
I tried multiple count, cast, floor functions but I might be getting the syntax wrong.

0 Votes 0 ·

cast(floatval as decimal(19,4))

0 Votes 0 ·