Customer feedback on Data Compression

Article
08/16/2009

As I described in my previous blog, the data compression feature has been very successful. We do appreciate all the feedback that we have received as this provides us a way to know how customers have been using the feature and the challenges they are facing. Based on the feedback, we plan to improve our data compression offering in future releases. Here is some of the feedack and the respective rationale. Please feel free to contact me for your suggestions on data compression.

(1) Provide Unicode compression: For the uninitiated, the SQL Server stores columns of type NCHAR, VARCHAR, NTEXT and NVARCHAR(MAX) in UCS-2 format which takes 2 bytes per character even when 1 byte will do (e.g. ASCII data), a rather common case for locale used with USA (en-USA) and other major european languages. This leads to a waste of 1 byte per Unicode character.

(2) Compress out of row LOB data: SQL Server compresses in-row LOB data but not if the LOB data is stored out of row. With the result many LOB heavy applications are unable to take full advantage of data compression. However, there are two workaround available. First, LOB data can be compressed at the application tier but this means (a) the application needs to be modified (b) the application cannot take advantage of search and partial update capabilities provided in SQL Engine. Second, use filestream feature to store LOB data on a compressed volume. This is our recommended solution where applicable. Please refer to Books-Online for details on filestream feature

(3) Improve column prefix compression: Currently PAGE compression provides column prefix compression but for some workloads the repeating words occur in the middle or towards the end of the column value which does not qualify for prefix compression. This reduces the data compression savings that can be achieved. It is not to say that these customers have not benefited from data compression but clearly they could have achieved even more compression savings.

(4) Enable estimate data compression savings computations in lower SKUs: Currently, you need to run sp_estimate_compression_savings stored procedure on EE or Dev edition. Many customers who are running on standard SKU would like to know what additional space savings they can get with data compression before they invest into a EE edition. Current work around is to use Dev edition to estimate the space savings which requires customers to install Dev Edition and restore the production database. It would be nice if we could just estimate the compression savings in the lower editions.

On this note, I am very pleased to announce that we have enhanced the data compression feature to include Unicode compression for NCHAR and NVARCHAR datatypes. The Unicode compression will be available as part of ROW compression in SQL Server 2008R2 release. You can down load the CTP2 version https://www.microsoft.com/sqlserver/2008/en/us/r2.aspx to play with it. In my next blogs, I will go into more details with Unicode compression.

Thanks

Sunil Agarwal

Customer feedback on Data Compression

Additional resources