patterns & practices Developer Center

From: Developing big data solutions on Microsoft Azure HDInsight

The growing awareness of the value of decisions based on proven data, combined with advances in data analysis tools and techniques, has resulted in an increased demand for versatile analytical data models that support ad-hoc analysis (the “self-service” approach).

PowerPivot is an Excel-based technology in Office 2013 Professional Plus and Office 365 ProPlus that enables advanced users to create complex data models that include hierarchies for drill-up/drill-down aggregations, custom data access expression (DAX) calculated measures, key performance indicators (KPIs), and other features not available in basic data models. PowerPivot is also available as an add-in for previous releases of Excel. PowerPivot uses xVelocity compression technology to support in-memory data models that enable high-performance analysis, even with extremely large volumes of data.

You can create and edit PowerPivot data models by using the PowerPivot for Excel interface, which is accessed from the PowerPivot tab of the Excel ribbon. Through this interface you can enhance tables that have been added to the workbook data model by other users or processes. You can also import multiple tables from one or more data sources into the data model and define relationships between them. Figure 1 shows a data model in PowerPivot for Excel.

Figure 1 - PowerPivot for Excel

Figure 1 - PowerPivot for Excel

PowerPivot brings many of the capabilities of enterprise BI to Excel, enabling business analysts to create personal data models for sophisticated self-service data analysis. Users can share PowerPivot workbooks through SharePoint Server, where they can be viewed interactively in a browser, enabling teams of analysts to collaborate on data analysis and reporting.

The following table describes specific considerations for using PowerPivot in the HDInsight use cases and models described in this guide.

Use case


Iterative data exploration

PowerPivot is a good choice when HDInsight data processing results can be encapsulated in Hive tables and imported directly into the PowerPivot data model over an ODBC connection, or when output files can be imported into Excel using Power Query. PowerPivot enables business analysts to enhance the data model and share it across teams through SharePoint Server or Power BI for Office 365.

Data warehouse on demand

When HDInsight is used to implement a basic data warehouse, it usually includes a schema of Hive tables that are queried over ODBC connections. PowerPivot makes it easy to import multiple Hive tables into a data model, define relationships between them, and refresh the model with new data if the Hive tables in HDInsight are updated.

ETL automation

Most ETL scenarios are designed to transform big data into a suitable structure and volume for storage in a relational data source for further analysis and querying. In this scenario, PowerPivot may be used to consume data from the relational data store loaded by the ETL process.

BI integration

PowerPivot is a core feature in Microsoft-based BI solutions, and enables self-service analysis that combines enterprise BI data with big data processing results from HDInsight at the report level. In a self-service BI scenario, the ability to publish PowerPivot workbooks in SharePoint Server or Power BI sites makes it easy for business analysts to collaborate and share the insights they discover.

Guidelines for using PowerPivot with HDInsight

When using PowerPivot with HDInsight, consider the following guidelines:

  • When importing data into tables in the PowerPivot data model, minimize the size of the workbook document by using filters to remove rows and columns that are not required.
  • If the PowerPivot workbook includes data from Hive tables, and you plan to share it in SharePoint Server or in a Power BI site, use an explicit ODBC connection string instead of a DSN. This will enable the PowerPivot data model to be refreshed when stored on a server where the DSN is not available.
  • Hide any numeric columns for which PowerPivot automatically generates aggregated measures. This ensures that they do not appear as dimension attributes in the PivotTable Fields and Power View Fields panes.

Next Topic | Previous Topic | Home | Community