PowerPivot for the DBA: Part 1

[This article was contributed by the SQL Azure team.]

In this article I am going to tie some simple terminology and methodology in business intelligence and PowerPivot with Transact-SQL – bring it down to earth for the DBA. You don’t need to know Transact-SQL to build awesome reports with PowerPivot, however if you already do these articles will attempt to bridge the learning gap.

History

I have been hearing the term Business Intelligence (BI) thrown around for a while, mostly from executives wanting their companies to start “doing” business intelligence. I have also been avoiding learning about business intelligence assuming it involved complicated mathematics and a whole new set of terminology – putting off what appeared to be a big learning curve until I had more time. I prefer to learn new technology by building on what I already know; I really wanted to relate business intelligence to my other SQL Server skills, including Transact-SQL. I finally took the time to learn the basics of business intelligence and realized that there is a close tie in with my Transact-SQL skills.

PowerPivot

PowerPivot, the Excel 2010 extension, is a great way to get started with business intelligence. It lets you experiment with relationships and report building. With it you can quickly and easily prototype reports and investigate data issues before you commit to an ERD or server set up etc

Let get going a do a simple example, connection to SQL Azure and the Adventure Works database. Then import these tables:

  • Sales.SalesOrderHeader
  • Sales.SalesOrderDetail
  • Production.Product
  • Production.ProductSubcategory
  • Production.ProductCategory

I covered how to connect to SQL Azure and import tables using PowerPivot in this blog post. The next step is to create a PivotTable. To do that go to the PowerPivot ribbon bar in Excel and choose PivotTable.

image

When the Pivot table appears and the docked PowerPivot Field List window add the LineTotal column from the SalesOrderDetail table in Values section of the field list, gives you the uninteresting PivotTable that looks like the one below.

image

This is the same as running this SELECT statement in Transact-SQL:

 SELECT    SUM(LineTotal)
FROM    Sales.SalesOrderDetail

Now that we have gotten the “Hello World” example out of the way, add the ProductCategory.Name column to the Rows Labels in PowerPivot is just like creating a SELECT statement in Transact-SQL with a GROUP BY clause.

image

You would get the same output (without the Grand Total) in the PowerPivot sample above by executing this statement Transact-SQL Statement

 SELECT    ProductCategory.Name, SUM(LineTotal)
FROM    Sales.SalesOrderDetail
    INNER JOIN Production.Product ON 
        Product.ProductID = SalesOrderDetail.ProductID
    INNER JOIN Production.ProductSubcategory ON 
        Product.ProductSubcategoryID = 
       ProductSubcategory.ProductSubcategoryID
    INNER JOIN Production.ProductCategory ON 
        ProductSubcategory.ProductCategoryID =
         ProductCategory.ProductCategoryID
GROUP BY ProductCategory.Name

If you continue to use the Adventure works database and move the ProductCategory.Name column to the Column Labels and add the Sales.SalesOrderHeader.OrderDate to the Row Labels PowerPivot will look like this:

image

Which is the same data as this Transact-SQL Statement containing a GROUP BY statement with both OrderDate and ProductCategory.Name:

 SELECT    ProductCategory.Name, SalesOrderHeader.OrderDate,
   SUM(LineTotal)
FROM    Sales.SalesOrderHeader
    INNER JOIN Sales.SalesOrderDetail ON 
        SalesOrderHeader.SalesOrderID = SalesOrderDetail.SalesOrderID
    INNER JOIN Production.Product ON 
        Product.ProductID = SalesOrderDetail.ProductID
    INNER JOIN Production.ProductSubcategory ON 
        Product.ProductSubcategoryID =
       ProductSubcategory.ProductSubcategoryID
    INNER JOIN Production.ProductCategory ON 
        ProductSubcategory.ProductCategoryID =
        ProductCategory.ProductCategoryID
GROUP BY ProductCategory.Name, SalesOrderHeader.OrderDate
ORDER BY SalesOrderHeader.OrderDate

However, here is where we see PowerPivot start to shine, with both a row label and a column label, PowerPivot pivots the category names into columns, sorts by date automatically, and provides grand totals for the rows and the columns. The DBA using Transact-SQL can get their tables pivoted also by using the PIVOT command in SQL Server 2008, here is what that Transact-SQL looks like for the sample above:

 SELECT OrderDate, [1] AS Bikes, [2] AS Components, [3] AS Clothing,
    [4] AS Accessories
FROM 
(SELECT    Sales.SalesOrderDetail.LineTotal,
    ProductCategory.ProductCategoryID,
  SalesOrderHeader.OrderDate
    FROM    Sales.SalesOrderHeader
        INNER JOIN Sales.SalesOrderDetail ON 
            SalesOrderHeader.SalesOrderID = SalesOrderDetail.SalesOrderID
        INNER JOIN Production.Product ON 
            Product.ProductID = SalesOrderDetail.ProductID
        INNER JOIN Production.ProductSubcategory ON 
            Product.ProductSubcategoryID =
         ProductSubcategory.ProductSubcategoryID
        INNER JOIN Production.ProductCategory ON 
            ProductSubcategory.ProductCategoryID = 
      ProductCategory.ProductCategoryID
) p
PIVOT
(
    SUM(LineTotal)
    FOR ProductCategoryID IN ( [1], [2], [3], [4] )
) AS pvt
ORDER BY pvt.OrderDate;

Summary

After reading this blog post you might try to summarize Business Intelligence as GROUP BY for Mangers; which would be a great slogan for a bumper sticker.

image

However, this post only scratching the surface, in my next blog post I will discuss what a measure is in business intelligence and continue writing matching Transact-SQL. Do you have questions, concerns, comments? Post them below and we will try to address them.