Introduction to Data Analysis Expressions (DAX) in Gemini
Gemini and DAX Extend the Power of PivotTables
Excel PivotTables are not new, but Excel PivotTables that are based on multiple tables of data are new with the Gemini add-in for Excel 2010. This new capability in Gemini is a very powerful tool for data analysis when combined with our new DAX expression language.
Looking at the PivotTable above, the values in the data area of this pivot come from a sales table, the customer information comes from a customer table, and the product information comes from a product table. In Excel, without Gemini, this analysis would only be possible by constructing a single flattened table with all of the information. Working with the single, flattened table in Excel introduces a variety of issues around performance, memory consumption, organization of fields in the field list, etc. More importantly, for some scenarios, a single flattened table simply isn’t possible.
What Is DAX?
DAX is an expression language based on Excel formula syntax. Because DAX is designed to work with multiple tables of data, it includes functions that implement relational database concepts. DAX also adds new functionality that allows you to create dynamic aggregations, making DAX formulas smart about calculating values in a PivotTable. Excel users will be happy to learn that DAX includes part of the existing Excel function library, and many functions are the same as Excel functions.
Here are a few simple DAX formulas to show how these look a lot like Excel formulas.
DAX formula |
Comment |
= [First Name] & “ “ & [Last Name] |
concatenation of strings is just like in Excel |
=SUM(Sales[Amount]) |
SUM function takes a column instead of a range of cells |
=RELATED (Product[Cost]) |
new RELATED function follows relationship between tables |
Where Is DAX Used?
When you start with a set of data tables, it is unlikely that the raw data already contains all the values that you need. You may need to perform two distinct types of calculations to generate the desired analysis: Calculated Columns and Measures.
Calculated Columns
Calculated columns in Gemini are just like the calculated columns in any Excel table. This is not a new concept.
When you enter the formula which defines a calculated column, that formula is evaluated for each cell in the column, and you immediately get a calculated column which is fully populated with values. For example, if you have one column [Qty] containing the sales quantity, and another column [PRICE] with the sales price, you might want to calculate an AMOUNT column which is simply the QTY times the PRICE. You could write this just as you would in Excel: = [QTY] * [PRICE]
The values that are created in calculated columns may subsequently be used in a PivotTable in any of several ways. A calculated column may contain:
· Numbers which are to be aggregated in the Values area of a PivotTable.
· Values which are to be placed in row labels or column labels of a PivotTable.
· Values which are to appear in a slicer and which will be used to slice the PivotTable results.
Measures
Measures in Gemini are formulas which are placed into the Values area of a PivotTable. Once they’ve been placed into the PivotTable, they are evaluated for each cell in the Values area. Each cell evaluation will be different based on the unique combination of the row labels, column labels, and filters associated with that cell.
There are two types of measures in Gemini: “implicit” measures and DAX measures
Implicit Measures: When you check the checkbox for a numeric field in the PivotTable field list, Gemini will automatically create a measure for you which is the “Sum of <numeric field >” and this measure is placed in the Values area of the PivotTable. You can change the Field Settings, to change the aggregation from SUM to COUNT, AVERAGE, MIN, or MAX. We call these “implicit” measures because all you need to do is select a field (column) from your table(s) and an aggregation (SUM, COUNT, AVERAGE, MIN, or MAX). The formula associated with those measures is implied by the selection and will always be a simple aggregation of a column of values.
DAX Measures: When you create a new measure using the New Measure dialog, you can give the measure a name and then provide any DAX formula you like. You’re not limited to simple aggregation of a column. By using various functions within the DAX expression language, you can perform some pretty sophisticated analysis. It is important to understand that when you enter the formula which defines a measure, the formula is not immediately evaluated. Gemini simply captures the name of the measure and the formula associated with it. Then, when the measure is placed into the Values area of the PivotTable, the measure will be evaluated for each cell of that PivotTable. If you look at the PivotTable at the top of this blog post, you can see that the measure “Margin” is evaluated for each combination of year and customer and customer type.
Of course, whether you check a checkbox to get an “implicit” measure, or whether you use the “New Measure” dialog to enter a DAX formula, after the measure is created, Gemini will also add that measure to the PivotTable.
Calculated Column Examples Including Some Basic DAX Functions
Here are two simple examples of calculated columns in the Sales table used in the PivotTable at the start of this post:
The [Amount] column is defined by a formula which could have been pasted from Excel: = [Qty] *[Sold Price]
The UnitCost column uses a new DAX function, the RELATED() function. RELATED() follows an existing relationship between two tables and returns a value from the related table. In this case, we are going from the Sales table to the related Product table, and returning the value from the [Cost] column in that table.
We also have a new function named RELATEDTABLE() which will follow an existing relationship and return a table which contains all the related rows from the specified table. Here’s an example of how that could be used to return the sum of all the sales for a given product, by creating a calculated column in the Product table.
You’ll notice that this example uses a new SUMX() function instead of the traditional SUM() function. SUMX() takes two arguments: the first argument is a table over which we will be adding up values, and the second argument is the expression to be evaluated for each row in the table. These are the values which will be added up. In this case, we’re saying that for each product in the product table, we want to get the table containing related sales (the sales for this product) and add up the numbers in the [Amount] column of that related table.
DAX Doesn’t Refer to Data by Cell Location
In DAX, we do not have the Excel notion of addressing a single cell of data. We do not identify values by where they are located in the grid. We can’t refer to cell B23 or range B12:C15. DAX functions always operate on columns of data in a table. Instead of viewing our data set as a large rectangle of cells, we think of the data as a set of tables which contain columns and rows.
DAX Aggregation Functions
DAX includes several aggregation functions from Excel including SUM, AVERAGE, MIN, MAX, COUNT, but instead of taking multiple arguments (a list of ranges,) they take a reference to a column. DAX also adds some new aggregation functions which aggregate any expression over the rows of a table.
· SUMX (Table, Expression)
· AVERAGEX (Table, Expression)
· COUNTAX (Table, Expression)
· MINX (Table, Expression)
· MAXX (Table, Expression)
DAX Table Functions
Excel has no functions which return a table. The formula which defines a calculated column must return a scalar value, so that the result can be placed into the column. But having functions which return tables of data can be very powerful when those tables are used as intermediate results which are passed as arguments to other functions (typically the new DAX aggregation functions). You’ve already seen an example of this above, when I returned a “related” table of sales transactions and then used SUMX to add up those transactions, placing a single sum into the products table.
The notion of functions which return tables may be new to Excel users, but hopefully this is a concept that will be easy to pick up. Here are a few examples of functions which return tables:
· RelatedTable (Table) returns table containing related rows of data
· Filter (Table, Condition) returns table filtered to include rows where condition is true
· Distinct (Column) returns one column table containing the distinct values in a column
DAX Measures deserve their own blog post...
The real power of data analysis using DAX formulas comes not so much from calculated columns, but from using DAX to define measures. But that’s much more complex than can be addressed in a single blog post, so I’ll dive deeper into DAX measures in a separate blog post sometime soon.
I hope you’re working with the CTP build of Gemini and that you’ll send us your feedback on Gemini, including any thoughts you may have about DAX!
Thanks!
Howie