Harnessing the Power of both worlds

 

Harness the Power of Both Worlds –SQL Engine and Analysis Services

Introduction

The combination of the SQL Engine and Analysis Services is awesome because it lets you harness the power of both worlds: columnstore Indexes in the SQL engine and real-time or relational OLAP (ROLAP) in Analysis Services.

Recently I have done some testing around a much talked about feature of the SQL engine – columnstore indexes. As you can learn in BOL, in SQL Server columnstore indexes store and manage data by using column-based data storage and column-based query processing in-memory. Columnstore indexes work well for data warehousing workloads that primarily perform bulk loads and read-only queries. By using the columnstore index you can achieve up to 10x query performance gains over traditional row-oriented storage, and up to 7x data compression over the uncompressed data size.

A columnstore index is a technology for storing, retrieving and managing data by using a columnar data format, called a columnstore. SQL Server supports both clustered and nonclustered columnstore indexes. Both use the same in-memory columnstore technology, but they do have differences in purpose and in features they support.

Benefits

Columnstore indexes work well for mostly read-only queries that perform analysis on large data sets. Often, these are queries for data warehousing workloads. Columnstore indexes give high performance gains for queries that use full table scans, and are not well-suited for queries that seek into the data, or that search for a particular value.

Columnstore indexes have these benefits:

· Columns often have similar data, which results in high compression rates.

· High compression rates improve query performance by using a smaller in-memory footprint. In turn, query performance can improve because SQL Server can perform more query and data operations in-memory.

· A new query execution mechanism called batch-mode execution has been added to SQL Server that reduces CPU usage by a large amount. Batch-mode execution is closely integrated with, and optimized around, the columnstore storage format. Batch-mode execution is sometimes known as vector-based or vectorized execution.

· Queries often select only a few columns from a table, which reduces total I/O from the physical media.

Solution with ROLAP and Columnstore Index

So how can you leverage this new shining star of SQL Server for business intelligence solutions?

I began by combining clustered columnstore indexes with Analysis Services ROLAP feature, enabling my fact tables for ROLAP, and then tested the performance of aggregations. As expected, I got pretty awesome performance with 500 million records.

So I doubled the number of records to 1 billion! At this scale, basic aggregates such as SUM and COUNT were returned in 2 to 3 seconds. The acid test of course lies in a certain trouble -making measure – you guessed it, Distinct Count.

So I did testing with Distinct Counts on the ROLAP solution and results were pretty positive, with around 1.5 billion records (1555618627), and around 15,638,865 distinct values. On a cold cache I got results within 2 seconds and on a warm cache in 16 milliseconds.

This was pretty awesome so I decided to try the combination of warm-caching with ROLAP in order to meet my SLA for the solution dashboards, which was around 4 seconds.

Recommendations

There are some important settings that you need to consider when implementing a BI solution based on in-memory columnstore indexes and Analysis Services ROLAP:

· EnableRolapDistinctCountOnDataSource – This is a very important property when we are using Analysis Services ROLAP. Set this property to 1 when using ROLAP with distinct count calculations. The property enables calculation of distinct count at the source (the relational database) instead of fetching rows in Analysis Services and calculating distinct count using Analysis Services.

· Statistics – In the absence of SQL Statistics Query Optimizer you will not be able to generate the right execution plan of a given query. Therefore it is very important to do timely updates of the statistics for the columnstore table; otherwise DML operations can degrade performance.

Hardware

For this experiment I used a Virtual Machine with 16 logical processors and 120 GB Ram.

References

Columnstore Indexes: https://msdn.microsoft.com/en-us/library/gg492088.aspx