# Aggregation WITH CUBE

In my last post, I wrote about how aggregation WITH ROLLUP works.  In this post, I will discuss how aggregation WITH CUBE works.  Like the WITH ROLLUP clause, the WITH CUBE clause permits us to compute multiple "levels" of aggregation in a single statement.  To understand the difference between these two clauses, let's look at an example.  We'll use the same fictitious sales data from last week's example.

CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)
INSERT Sales VALUES(1, 2005, 12000)
INSERT Sales VALUES(1, 2006, 18000)
INSERT Sales VALUES(1, 2007, 25000)
INSERT Sales VALUES(2, 2005, 15000)
INSERT Sales VALUES(2, 2006, 6000)
INSERT Sales VALUES(3, 2006, 20000)
INSERT Sales VALUES(3, 2007, 24000)

Consider the following query from last week:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP

It will be easier to see what is happening if we pivot the sales data:

 Yr 2005 2006 2007 ALL EmpId 1 12000.00 18000.00 25000.00 55000.00 2 15000.00 6000.00 21000.00 3 20000.00 24000.00 44000.00 ALL 120000.00

The table clearly shows that the WITH ROLLUP clause computes the total for each employee for all years and the grand total for all employees and all years.  The query does not compute the totals for each year for all employees.  Moreover, the order of the columns in the GROUP BY clause determines in which order the data is totaled.

Now let's repeat the same query but replace the WITH ROLLUP clause with a WITH CUBE clause:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH CUBE

This query computes all possible sub-totals and totals:

 Yr 2005 2006 2007 ALL EmpId 1 12000.00 18000.00 25000.00 55000.00 2 15000.00 6000.00 21000.00 3 20000.00 24000.00 44000.00 ALL 27000.00 44000.00 49000.00 120000.00

Because the WITH CUBE clause causes the query to compute all possible totals, the order of the columns in the GROUP BY clause does not matter.  Of course, by default, SQL Server does not pivot the results of either of the above queries.  Here is the actual output from the WITH CUBE query:

`````` EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
1           2006        18000.00
1           2007        25000.00
1           NULL        55000.00
2           2005        15000.00
2           2006        6000.00
2           NULL        21000.00
3           2006        20000.00
3           2007        24000.00
3           NULL        44000.00
NULL        NULL        120000.00
NULL        2005        27000.00
NULL        2006        44000.00
NULL        2007        49000.00
``````

Next, let's look at the query plan for the WITH CUBE query:

|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006] END))
|--Concatenation
|--Stream Aggregate(GROUP BY:([Sales].[EmpId], [Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))
|    |--Sort(ORDER BY:([Sales].[EmpId] ASC, [Sales].[Yr] ASC))
|         |--Table Spool
|              |--Stream Aggregate(GROUP BY:([Sales].[Yr], [Sales].[EmpId]) DEFINE:([Expr1007]=COUNT_BIG([Sales].[Sales]), [Expr1008]=SUM([Sales].[Sales])))
|                   |--Sort(ORDER BY:([Sales].[Yr] ASC, [Sales].[EmpId] ASC))
|                        |--Table Scan(OBJECT:([Sales]))
|--Compute Scalar(DEFINE:([Expr1012]=NULL))
|--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))
|--Table Spool

This plan consists of two parts.  SQL Server has effectively rewritten our query as follows:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP
UNION ALL
SELECT NULL, Yr, SUM(Sales)
FROM Sales
GROUP BY Yr

The first part of the plan computes the result for the WITH ROLLUP query above.  I described how this query plan works in last week's post.  The second part of this plan computes the missing year sub-totals yielding the entire CUBE result.  Note that this plan employs a common sub-expression spool.  As I discussed in this post, a common sub-expression spool copies its input rows into a worktable and then reads and returns the rows from the worktable multiple times - in this case twice.  The spool is meant to improve performance although, in this example, it has little impact since the server could just as easily have re-read the original Sales table.  However, if the input to the aggregation was more complex and cost more to evaluate, the spool would help.

If we use the WITH CUBE clause when aggregating on more than two columns, SQL Server simply generates increasingly complex plans with additional inputs to the concatentation operator.  As with the simple two column example, the idea is to compute the whole CUBE by computing all of the individual ROLLUPs that compose it.

Finally, we can actually combine WITH CUBE and PIVOT to generate the above table in a single simple statement.  (I actually proposed a variation of this query in an answer to a reader's comment on my post about the PIVOT operator but I like this solution better.)

SELECT EmpId, , , , [ALL]
FROM
(
SELECT
CASE WHEN GROUPING(EmpId) = 0
THEN CAST (EmpId AS CHAR(7))
ELSE 'ALL'
END AS EmpId,
CASE WHEN GROUPING(Yr) = 0
THEN CAST (Yr AS CHAR(7))
ELSE 'ALL'
END AS Yr,
SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH CUBE
) AS s
PIVOT (SUM(Sales) FOR Yr IN (, , , [ALL])) AS p

Here is the output from this query:

`````` EmpId   2005                  2006                  2007                  ALL
------- --------------------- --------------------- --------------------- ---------------------
1       12000.00              18000.00              25000.00              55000.00
2       15000.00              6000.00               NULL                  21000.00
3       NULL                  20000.00              24000.00              44000.00
ALL     27000.00              44000.00              49000.00              120000.00
``````