SQL Server Best Practices Article

Article
01/28/2010

Writer: Matt Carroll

Contributors: Robert Zare, Edward Melomed, Akshai Mirchandani, Marius Dumitru, Siva Harinath

Technical Reviewer: Craig Utley, Nicholas Dritsas

Project Editor: Jeannine Nelson-Takaki

Applies To: SQL Server 2005 SP1

Introduction

This paper outlines the recommended best practices for designing OLAP databases in Microsoft SQL Server 2005 Analysis Services to better meet the functional and performance needs of users. Designing a database for Online Analytical Processing (OLAP) databases is very different from designing a database for Online Transaction Processing (OLTP) and requires a different approach for achieving optimal results.

The best practices recommendations are grouped into the following sections:

Data Source Design Best Practices
Dimension Design Best Practices
Cube Design Best Practices
Partition Design Best Practices
Aggregation Design Best Practices

Within each section, these best practices are ordered roughly by priority, where priority is a combination of how important the best practice is and how often you must worry about it. Major topics not covered include best practices for the design of calculations and for server management and operations. These two topics are sufficiently complex to deserve separate and more in-depth treatment.

Data Source Design Best Practices

OLAP databases in Analysis Services 2005 have several special requirements that are important to ensure trouble-free access to source data.

Do use only supported OLEDB providers in a Data Source

Analysis Services was designed and tested to work with a specific set of OLE DB providers. Although other OLE DB providers are available, and the Data Source Wizard lets you choose any compatible provider, the capabilities and behavior of different providers can differ substantially. This is true even for different providers that connect to the same database. Therefore, you should use only the supported providers. For a list of officially supported providers, see https://msdn2.microsoft.com/en-us/library/ms175608.aspx.

Do not use the .Net SqlClient Data Provider to connect to a SQL Server data source

Because the Analysis Services server runs in native code, you can get better performance by using a native provider. Therefore, do not use the .Net SqlClient Data Provider; instead, use the Microsoft OLE DB Provider for SQL Server or the SQL Native Client provider.

Dimension Design Best Practices

Good dimension design is the most important aspect of a well designed Analysis Services OLAP database. Although the wizards in Analysis Services do much of the work to get you started, it is important to review the design that is created by the wizard and ensure that the attributes, relationships, and hierarchies correctly reflect the data and match the needs of your end-users.

Do create attribute relationships wherever they exist in the data

Attribute relationships are an important part of dimension design. They help the server optimize storage of data, define referential integrity rules within the dimension, control the presence of member properties, and determine how MDX restrictions on one hierarchy affect the values in another hierarchy. For these reasons, it is important to spend some time defining attribute relationships that accurately reflect relationships in the data.

Avoid creating attributes that will not be used

Attributes add to the complexity and storage requirements of a dimension, and the number of attributes in a dimension can significantly affect performance. This is especially of attributes which have AttributeHierachyEnabled set to True. Although SQL Server 2005 Analysis Services can support many attributes in a dimension, having more attributes than are actually used decreases performance unnecessarily and can make the end-user experience more difficult.

It is usually not necessary to create an attribute for every column in a table. Even though the wizards do this by default in Analysis Services 2005, a better design approach is to start with the attributes you know you'll need, and later add more attributes. Adding attributes as you discover they are needed is generally better a better practice than adding everything and then removing attributes.

Do not create hierarchies where an attribute of a lower level contains fewer members than an attribute of the level above

A hierarchy such as this is frequently an indication that your levels are in the incorrect order: for example, [City] above [State]. It might also indicate that the key columns of the lower level are missing a column: for example, [Year] above [Quarter Number] instead of [Year] above [Quarter with Year]. Either of these situations will lead to confusion for end-users trying to use and understand the cube.

Do not include more than one non-aggregatable attribute per dimension

Because there is no All member, each non-aggregatable attribute will always have some non-all member selected, even if not specified in a query. Therefore, if you include multiple non-aggregatable attributes in a dimension, the selected attributes will conflict and produce unexpected numbers.

For example, in a time dimension it might not make sense to sum the members of [Calendar Year] or [Fiscal Year], but if both are made non-aggregatable, whenever a user asks for data for a specific [Calendar Year] it will be filtered by the default [Fiscal Year] unless they also specify the [Fiscal Year]. Worse, because [Calendar Year] and [Fiscal Year] do not align but overlap, it is difficult to obtain the full data for either a [Calendar Year] or a [Fiscal Year] because the one is filtered by the other.

Do use key columns that completely and correctly define the uniqueness of the members in an attribute

Usually a single key column is sufficient, but sometimes multiple key columns are necessary to uniquely identify members of an attribute. For example, it is common in time dimensions to have a [Month] attribute include both [Year] and [Month Name] as key columns. This is known as a composite key and identifies January of 1997 as being a different member than January of 1998. When you use [Month] in a time hierarchy that also contains [Year], this distinction between January of 1997 and January of 1998 is important.

It may also make sense to have a separate [Month of Year] attribute that has only [Month Name] as the key. This [Month of Year] attribute contains a single January member that spans all years, which can be useful for comparing seasonal data. However, this attribute should not be used in a hierarchy together with [Year] because there is no relationship between [Month of Year] and [Year].

Similar distinctions between [Quarter] and [Quarter of Year], [Semester] and [Semester of Year], and so on should also be made by setting appropriate key columns.

Do perform Process Index after doing a Process Update if the dimension contains flexible AttributeRelationships or a parent-child hierarchy

An aggregation is considered flexible if any attribute included in the aggregation is related, either directly or indirectly, to the key of its dimension through an AttributeRelationship with RelationshipType set to Flexible. Aggregations that include parent-child hierarchies are also considered flexible.

When a dimension is processed by using the Process Update option, any flexible aggregations that the dimension participates in might be dropped, depending on the contents of the new dimension data. These aggregations are not rebuilt by default, so Process Index must then be explicitly performed to rebuild them.

Do use numeric keys for attributes that contain many members (>1 million)

Using a numeric key column instead of a string key column or a composite key will improve the performance of attributes that contain many members. This best practice is based on the same concept as using surrogate keys in relational tables for more efficient indexing. You can specify the numeric surrogate column as the key column and still use a string column as the name column so that the attribute members appear the same to end-users. As a guideline, if the attribute has more than one million members, you should consider using a numeric key.

Do not create redundant attribute relationships

Do not create attribute relationships that are transitively implied by other attribute relationships. The alternative paths created by these redundant attribute relationships can cause problems for the server and are of no benefit to the dimension. For example, if the relationships A->B, B->C, and A->C have been created, A->C is redundant and should be removed.

Do include the key columns of snowflake tables joined to nullable foreign keys as attributes that have NullProcessing set to UnknownMember

If tables that are used in a dimension are joined on a foreign key column that might contain nulls, it is important that you include in your design an attribute whose key column is the corresponding key in the lookup table. Without such an attribute, the OLAP server would have to issue a query to join the two tables during dimension processing. This makes processing slower; moreover, the default join that is created by the OLAP server would exclude any rows that contain nulls in the foreign key column. It is important to set the NullProcessing option on the key column of this attribute to UnknownMember. The reason is that, by default, nulls are converted to zeros or blanks when the engine processes attributes. This can be dangerous when you are processing a nullable foreign key. Conversion of a null to zero at best produces an error; in the worst case, the zero may be a legitimate value in the lookup table, thereby producing incorrect results.

To handle nullable foreign keys correctly, you must also set UnknownMember to Visible on the dimension. The Cube Wizard and Dimension Wizard currently set this property automatically; however, the Dimension Wizard lets you manually de-select the key attribute of snowflake tables. You must not deselect the key column if the corresponding foreign key is nullable.

If you do not want to browse the attribute that contains the lookup table key column, you can set AttributeHierarchyVisible to False. However, AttributeHierarchyEnabled must be set to True because it is necessary that all other attributes in the lookup table be directly or indirectly related to the lookup key attribute in order to avoid the automatic creation of new joins during dimension processing.

Do set the RelationshipType property appropriately on AttributeRelationships based on whether the relationships between individual members change over time

The relationships between members of some attributes, such as dates in a given month or the gender of a customer, are not expected to change. Other relationships, such as salespeople in a given region or the marital status of a customer, are more prone to change over time. You should set RelationshipType to Flexible for those relationships that are expected to change and set RelationshipType to Rigid for relationships that are not expected to change.

When you set RelationshipType appropriately, the server can optimize the processing of changes and re-building of aggregations.

By default, the user interface always sets RelationshipType to Flexible.

Avoid using ErrorConfigurations with KeyDuplicate set to IgnoreError on dimensions

When KeyDuplicate is set to IgnoreError, it can be difficult to detect problems with incorrect key columns, incorrectly defined AttributeRelationships, and data consistency issues. Instead of using the IgnoreError option, in most cases it is better to correct your design and clean the data. The IgnoreError option may be useful in prototypes where correctness is less of a concern. Be aware that the default value for KeyDuplicate is IgnoreError. Therefore, it is important to change this value after prototyping is complete to ensure data consistency.

Do define explicit default members for non-aggregatable attributes

By default, the All member is used as the default member for aggregatable attributes. This default works very well for aggregatable attributes, but non-aggregatable attributes have no obvious choice for the server to use as a default member, therefore a member will be selected arbitrarily. This arbitrarily selected member is then selected whenever the attribute is not explicitly included in an MDX query. To avoid this, it is important to explicitly set a default value for each non-aggregatable attribute.

Default members can be explicitly set either on the DimensionAttribute or in the cube script.

Avoid creating user-defined hierarchies that do not have attribute relationships relating each level to the level above

Having attribute relationships between every level in a hierarchy makes the hierarchy strong and enables significant server optimizations.

Avoid creating diamond-shaped attribute relationships

A Diamond-shaped relationship refers to a chain of attribute relationships that splits and rejoins but contains no redundant relationships. For example, Day->Month->Year and Day->Quarter->Year have the same start and end points, but do not have any common relationships. The presence of multiple paths can create some ambiguity on the server. If preserving the multiple paths is important, it is strongly recommended that you resolve the ambiguity by creating user hierarchies that contain all the paths.

Consider setting AttributeHierarchyEnabled to False on attributes that have cardinality that closely matches the key attribute

When an attribute contains roughly one value for each distinct value of the key attribute, it usually means that the attribute contains only alternative identification information or secondary details. Such attributes are usually not interesting to pivot or group by. For example, the Social Security number or telephone number may be interesting properties to view, but there is very little value in being able to pivot and group based on SSN or telephone. Setting AttributeHierarchyEnabled to False on such attributes will reduce the complexity of the dimension for end-users and improve its performance.

If you want to be able to browse such attributes, you can set AttributeHierarchyEnabled to True; however, you should consider setting AttributeHierarchyOptimized to NotOptimized and setting GroupingBehavior to DiscourageGrouping. By setting these properties, you can improve performance and indicate to the users that the attribute is not very useful for grouping.

Consider setting AttributeHierarchyVisible to False on the key attribute of parent-child dimensions

Because the members of the key attribute are also contained in the parent-child hierarchy in a more organized manner, it is usually unnecessary and confusing to the end-user to expose the flat list of members contained in the key attribute.

Avoid setting UnknownMember=Hidden

When you suppress unknown members, the effect is to hide relational integrity issues; moreover, because hidden members might contain data, results might appear not to add up. Therefore, we recommend that you avoid use of this setting except in prototype applications.

Do use MOLAP storage mode for dimensions with outline calculations (custom rollups, semi-additive measures, and unary operators)

Dimensions that contain custom rollups or unary operators will perform significantly better using MOLAP storage. The following dimension types will also benefit from using MOLAP storage: an Account dimension in a measure group that contains measures aggregated using ByAccount; the first time dimension in a measure group that contains other semi-additive measures.

Do use a 64 bit server if you have dimensions with more than 10 million members

If a dimension contains more than 10 million members, using an x64 or an IA-64-based server is recommended for better performance.

Do set the OrderBy property for time attributes and other attributes whose natural ordering is not alphabetical

By default, the server orders attribute members alphabetically, by name. This ordering is especially undesirable for time attributes. To obtain the desired ordering, use the OrderBy and OrderByAttributes properties and explicitly specify how you want the members ordered. For time-based attributes, there is frequently a date or numeric key column that can be used to obtain the correct chronological ordering.

Do expose a DateTime MemberValue for date attributes

Some clients, such as Excel, will take advantage of the MemberValue property of date members and use the DateTime value that is exposed. When Excel recognizes the value as DateTime, Excel can treat the value as a date type and apply date functions to the value, as well as provide better formatting and filtering. If the key column is a single DateTime column and the name column has not been set, this MemberValue is automatically derived from the key column and no action is necessary. However, in other cases, you can ensure that the MemberValue is DateTime by explicitly specifying the ValueColumn property of the attribute.

Do set AttributeHierarchyEnabled to False, specify a ValueColumn and specify the MimeType of the ValueColumn on attributes that contain images

Because there is no value in browsing the member names of an attribute that contains an image, you should disable browsing by setting AttributeHierarchyEnabled to False. To help clients recognize and display the member property of the attribute as an image, specify the ValueColumn property of the attribute and then set MimeType to an appropriate image type.

Avoid setting IsAggregatable to False on any attribute other than the parent attribute in a parent-child dimension

Non-aggregatable attributes have non-all default members. These default members affect the result of queries whenever the attributes are not explicitly included. Because parent-child hierarchies generally represent the most interesting exploration path in dimensions that contain them, it is best to avoid having non-aggregatable attributes other than the parent attribute.

Do set dimension and attribute Type properties correctly for Time, Account, and Geography dimensions

For time dimensions, it is important to set the dimension and attribute types correctly so that time-related MDX functions and the time intelligence of the Business Intelligence Wizard can work correctly. For Account dimensions, it is similarly important to set appropriate account types when using measures with the aggregate function ByAccount. Geography types are not used by the server, but provide information for client applications.

A common mistake is to set the Type property on a dimension but not on an attribute, or vice-versa. Another common mistake when configuring time dimensions is to confuse the different time attribute types, such as [Month] and [Month of Year].

Chains of related attributes usually represent an interesting navigation path for end-users, and defining hierarchies for these will also provide performance benefits.

Do include all desired attributes of a logical business entity in a single dimension instead of splitting them up over several dimensions

In Analysis Services 2000, each hierarchy was in reality a separate dimension and attributes such as gender and age would also be separate dimensions. In Analysis Services 2005, a dimension can and should contain the complete information about a logical business entity, including multiple hierarchies and many attributes. This does not mean that every piece of information available must be included in the dimension, but rather that any desired information should be included in one dimension instead of split over many dimensions.

There are two exceptions to this guideline:

A dimension can only contain one parent-child hierarchy.
To model multiple joins to a lookup table within a dimension's schema, you must create a separate dimension based on the lookup table and then use this as a referenced dimension.

Do not combine unrelated business entities into a single dimension

Combining attributes of independent business entities, such as customer and product or warehouse and time, into a single dimension will not only create a confusing model, but also reduce query performance because auto-exist will be applied across attributes within the dimension.

Another way to state this rule is that the values of the key attribute of a dimension should uniquely identify a single business entity and not a combination of entities. Generally this means having a single column key for the key attribute.

Do set NullProcessing to UnknownMember on each attribute that has nulls and is used to join to a referenced dimension

By default, nulls are converted to zeros or blanks when the engine processes attributes. This can be dangerous when processing a nullable foreign key, because if a null is converted to zero when zero is a legitimate value in the reference dimension, the join on the values can produce incorrect results. At best, conversion to zero will produce an error.

To prevent these errors, you must also set UnknownMember to Visible on the referenced dimension.

The Cube Wizard in SQL Server 2005 Analysis Services handles both settings automatically, except when dealing with existing dimensions where UnknownMember is not set to Visible.

Do set NullKeyConvertToUnknown to IgnoreError on the ErrorConfiguration on any measure groups that contain a dimension referenced through a nullable column

By default, nulls are converted to zeros or blanks when the engine processes granularity attributes. This can be dangerous when you are processing a nullable foreign key, because if a null value is converted to zero and zero is a legitimate value in the dimension, the join can produce incorrect results. At best, the conversion will produce errors.

To prevent conversion of nulls, you must also set UnknownMember to Visible on the dimension.

The Cube Wizard in SQL Server 2005 Analysis Services handles these settings automatically, except when dealing with existing dimensions where UnknownMember is not set to Visible.

Consider setting AttributeHierarchyVisible to False for attributes included in user-defined hierarchies

It is usually not necessary to expose an attribute in its own single level hierarchy when that attribute is included in a user-defined hierarchy. This duplication only complicates the end-user experience without providing additional value.

One common case in which it is appropriate to present two views of an attribute is in time dimensions. The ability to browse by [Month] and the ability to browse by [Month-Quarter-Year] are both very valuable. However, these two month attributes are actually separate attributes. The first contains only the month value such as “January” while the second contains the month and the year such as “January 1998”.

Do not use proactive caching settings that put dimensions into ROLAP mode

For performance reasons, we strongly discourage the use of dimension proactive caching settings that may put the dimension in ROLAP mode. To ensure that a dimension with proactive caching enabled will never enter ROLAP mode, you should set the OnlineMode property to OnCacheComplete. You can also prevent use of ROLAP mode by deselecting the Bring online immediately check box in the Storage Options dialog box.

Avoid making an attribute non-aggregatable unless it is at the end of the longest chain of attribute relationships in the dimension

Non-aggregatable attributes have non-all default members that affect the result of queries in which values for those attributes are not explicitly specified. Therefore, you should avoid making an attribute non-aggregatable unless that attribute is regularly used. Because the longest chain of attributes generally represents the most interesting exploration path for users, it is best to avoid having non-aggregatable attributes in other, less interesting chains.

Consider creating at least one user-defined hierarchy in each dimension that does not contain a parent-child hierarchy

Most (but not all) dimensions contain some hierarchical structure to the data which is worth exposing in the cube. Frequently the Cube Wizard or Dimension Wizard will not detect this hierarchy. In these cases, you should define a hierarchy manually.

Do set the InstanceSelection property on attributes to help clients determine the best way to display attributes for member selection

If there are too many members to display in a single list, the client user interface can use other methods, such as filtered lists, to display the members. By setting the InstanceSelection property, you provide a hint to client applications to suggest how a list of items should be displayed, based on the expected number of items in the list.

Recommended guidelines for setting this property are as follows:

Property Value	Description	Guidelines
None	Do not display a selection list. Enable users to enter values directly.
DropDown	The number of items is small enough to display in a drop-down list.	<100 members
List	The number of items is too large for a drop-down list but does not require filtering.	< 500 members
FilteredList	The number of items is large enough to require users to filter the items to be displayed.	< 5000 members
MandatoryFilter	The number of items is so large that the display must always be filtered.	> 5000 members

The following C# code sample shows how to set the property dynamically:

if ( estimatedCount < 100 )
    attribute.InstanceSelection = InstanceSelection.DropDown;
else if ( estimatedCount< 500 )
    attribute.InstanceSelection = InstanceSelection.List;
else if ( estimatedCount< 5000 )
    attribute.InstanceSelection = InstanceSelection.FilteredList;
else
    attribute.InstanceSelection = InstanceSelection.MandatoryFilter;

Cube Design Best Practices

Well-designed cubes will better match the needs of your end-users. The following best practices can improve usability and also performance of the cube.

Avoid including unrelated measure groups in the same cube

Having many measure groups in a cube can adversely affect the query performance of the cube, even for queries that only use a single measure group. If your DSV contains many potential fact tables and they will not usually be used together, consider creating multiple smaller cubes containing topic areas that are used together. Then, you can enable occasional cross-topic querying by creating linked measure groups that join together all measure groups in a special cross-topic cube.

Avoid having too many parent-child dimensions in a cube, especially when the dimension contains custom rollups or unary operators

Parent-child dimensions, especially those that contain custom rollups and unary operators, are very powerful and flexible. However, in order to prevent poor query performance, such parent-child dimensions should only be used as necessary.

Do set AttributeHierarchyEnabled to False on any cube attributes that are below the level of granularity of all measure groups in the cube

Attributes are considered below the level of granularity for a measure group when no chain of attribute relationships exists between the granularity attribute and the attribute in question. Such attributes will never have any data associated with them in the cube and will only confuse end-users. For example, if a time dimension contains day, week, month and year attributes, whereas an inventory cube contains only data updated on a weekly basis, it is best to disable the day attribute for this cube by setting AttributeHierarchyEnabled to False for the day attribute only.

Avoid having very big intermediate measure groups or dimensions of many-to-many dimensions

Having very big intermediate measure groups could lead to potential performance issues in many-many dimensions. “Very big” means larger than the measure group or more than one million rows. You can also experience performance issues when using very big dimensions in many-many dimension relationships.

Consider using a processing query when using the scheduled polling option of proactive caching

A polling query is typically a singleton query that returns a value Analysis Services can use to determine whether changes have been made to a table or other relational object. The polling query tells the proactive caching mechanism when it must read in new data.

In contrast, a processing query is a query that returns a rowset that contains the changes made to a table associated with an Analysis Services object since the last time the table was polled, in order to incrementally update the multidimensional OLAP (MOLAP) cache for the object. Therefore, the processing query tells the proactive caching mechanism what data is new. By setting the processing query, you can configure the system to read and process only the new data, instead of re-reading and re-processing the whole table.

Avoid creating multiple measure groups that have the same dimensionality and granularity

Having too many measure groups can adversely affect performance. Moreover, the presence of many similar measure groups presents a more complex model to the end-user by dividing data on a common subject. Instead, you should consider creating a single measure group that contains multiple partitions. The common exception to this guideline is distinct count measures, which for performance reasons should be in their own measure groups.

Do put each distinct count measure into a separate measure group

Put each distinct count measure into its own measure group. This will improve performance.

Do set any explicit default members of role-playing dimensions directly on the cube dimensions

A dimension that has been given a different name in a cube than it has outside the cube is said to be role-playing. The name that is used in the cube is known as the role the dimension plays within that cube. Role-playing is used most often when a dimension plays more than one role in a cube. For example, the [Order Date] and [Ship Date] cube dimensions are both based on the [Date] dimension, but are assigned different names, or roles.

Because such a dimension has a different name in the cube, whereas the default member expression is evaluated in the context of the cube, any explicit reference to the original dimension name in the default member will be invalid. To specify the default member on the cube dimension, you can use the following syntax in the cube’s calculation script:

ALTER CUBE CurrentCube UPDATE DIMENSION <dimension name>, DEFAULT_MEMBER='<default member>'

This has the added benefit of enabling different default members for each of the roles a dimension plays in a cube. Such role specific default members frequently make sense from the business perspective.

Do reuse dimensions multiple times in a cube instead of creating duplicates of a dimension

In Analysis Services 2000, it was necessary to create new copies of a dimension for role-playing. In other words, to use two identical date dimensions, [Ship Date] and [Order Date], you needed to have two separate dimensions. However, in Analysis Services 2005, this is not necessary and should be avoided as it will introduce unnecessary storage and maintenance costs.

Avoid having cubes with a single dimension

If a cube has only one dimension, you should consider splitting this single dimension into multiple dimensions based on logical business entities. Typically, a cube contains only a single dimension because it was based on a single de-normalized table, which the Cube Wizard was unable to break into multiple dimensions.

This single dimension almost always contains multiple logical business entities. From the perspective of the user, the different business entities would be more cleanly modeled and navigated as separate dimensions.

Do use the smallest numeric data type possible for measures

The data type of measures must be large enough to hold the largest aggregated value (the "all" value) but should be no larger than necessary to reduce storage costs.

Do ensure that the Collation property of OLAP objects is consistent with the collation of the relational source data when dealing with multilingual data

Collation controls both the sorting and equivalence of strings; therefore it is important to have the Analysis Services server use the appropriate collation for the data. If you use the wrong collation, you might see strange results, especially if the column is used for a distinct count measure.

Collations can be set at the level of the server, database, cube, and dimension and also on individual column bindings. The important point is to have the collation of your OLAP objects be consistent with the collation of the relational data on which they are built. For example, the caption columns of translations usually contain text in another language and are therefore likely to have a different collation.

Do materialize referenced dimensions

To improve performance, reference dimensions should be materialized. The only exception to this is the rare case when you are creating a reference dimension relationship between a local dimension and a linked measure group. When a reference dimension is created by using the user interface for SQL Server 2005 Analysis Services, it will be materialized by default.

Avoid using linked dimensions, especially if your cube has outline calculations (custom rollups, semi-additive measures, or unary operators) or scripts

Although it is possible to use dimensions linked from another database (or server) with a local measure group, this will adversely affect performance and should be avoided when possible. Instead, consider creating a local copy of the dimension. Even on the same server, you should avoid linked dimensions.

Partition Design Best Practices

Following some simple best practices can help ensure an effective data storage strategy that will enable the OLAP server to scale efficiently to your data volumes.

Avoid having partitions with more than 20 million rows

Large partitions should generally be broken up into multiple smaller partitions for better performance. As a rule, no partition should have more than 20 million rows.

Avoid having many small partitions in a measure group

Small partitions should generally be combined into fewer, larger partitions for better performance. As a guideline, consider combining partitions when a measure group has more than five partitions with less than two million rows each.

Do set the Slice property on partitions that are ROLAP or partitions that use proactive caching

Use of the Slice property lets the server know which partitions it must look in for data to answer queries. For example, if a partition contains a slice that states it contains only data for 2002, when a query requests sales in 2003, the server will know it does not have to look in that partition in order to answer that query.

For MOLAP partitions, you do not have to specify the slice because the server will be able to figure out the relevant partitions after the data has been processed. However, when you use proactive caching, it’s a good idea to specify the slice manually since the partition may be treated as ROLAP during the construction of the MOLAP cache. (If a slice is manually specified on a MOLAP partition and that slice is not consistent with the data that is contained in the partition, the server will raise an error during processing of the data.)

For ROLAP partitions, a slice should be specified or the partition will always have to be queried.

To set the Slice property in Business Intelligence Development Studio, select the partition in the Partitions tab of the Cube Editor. In the Properties window, click the text box for the Slice property. Click the ellipsis (...) button to open the Partition Slice dialog box. You can type an MDX expression to specify the member on which you want to slice. To slice on multiple hierarchies, enter a tuple containing the members on which you want to slice.

Consider partitioning a distinct count measure group along the dimension used most often to query the distinct count measure

When partitioning a measure group that includes a distinct count measure, consider partitioning the measure group along the dimension that is most often used to query distinct count measure. This will provide improved query performance by reducing the frequency that all partitions must be accessed.

Aggregation Design Best Practices

Good aggregation design is important for achieving good performance when dealing with large amounts of data. Although aggregation design is a complex topic, much of the complexity can be handled automatically by the Storage Design Wizard in SQL Server 2005 Analysis Services. If you decide to manually design aggregations using some other tool, you’ll need to plan your design much more carefully. The following best practices contain guidance to help succeed in both scenarios.

Consider having aggregations for each partition of significant size

Aggregations will generally improve the performance for partitions of non-trivial size. We recommend using aggregations when partition size exceeds 500,000 rows.

Do include the granularity attribute of the time dimension in aggregations for measure groups with semi-additive measures

Semi-additive measures are calculated at run time by using the granularity attribute of the time dimension. Therefore, only aggregations that contain this granularity attribute will help performance when calculating the value of semi-additive measures.

If you set the AggregationUsage property to Full, the Aggregation Design Wizard and the Usage-Based Optimization Wizard will include this attribute in every aggregation. If you manually design your aggregations, make sure to include such attributes in every aggregation.

Do not build too many aggregations

Do not build excessive aggregations. This can adversely affect performance by of increasing search space during queries and increasing the work necessary during processing. As a rule of thumb, do not build more than 500 aggregations per partition.

It is usually not necessary or desirable from a management perspective to create separate aggregation designs for every partition. You generally only need different aggregation designs for partitions that have different sizes or usage patterns. Therefore, similar aggregation designs are not necessary and will add complexity to the database.

Partitions that have similar size and usage should share the same aggregation designs. As a rule, three aggregation designs per measure group are generally enough.

Consider creating separate aggregation designs for partitions with significantly different size or usage

When partitions have significantly different size or usage patterns, it is generally better to create separate aggregation designs and customize the design appropriately. For example, it makes sense to create separate partitions for each month of the current year and use the same aggregation design for these partitions. However, you will probably want to create a separate aggregation design for the partition that contains all the data for a past year. This is better because archived data usually is combined into larger partitions and is accessed less frequently.

For example, the storage settings for partitions can be managed by using the following guidelines:

Frequently queried partitions -> Use MOLAP with lots of aggregations

Periodically queried partitions -> Use MOLAP with less or no aggregations

Rarely queried partitions -> Use ROLAP with no aggregations

Do use a lower performance gain with regular aggregation design and a higher performance gain with usage-based optimization

When you design generic aggregations, you should target a lower performance gain (20-30%). When you design aggregations using the Usage-Based Optimization Wizard, you should target a higher performance gain (70-80%), because these aggregations are more specific to actual use.

Consider setting AggregationUsage to Unrestricted on high-use attributes that are not the key of a dimension and not in a hierarchy

By default, the aggregation design algorithm considers only key attributes and attributes in natural hierarchies for inclusion in aggregations. Although this is usually the best strategy, if you have an attribute that is neither the key attribute nor contained in a hierarchy, but you expect people to group by the attribute or filter on it, you should consider setting AggregationUsage to Unrestricted. When you do this, the aggregation design algorithm will consider the attribute when it designs aggregations.

Avoid Setting AggregationUsage to Full on Large Attributes

Avoid setting the AggregationUsage property to Full on attributes that have many members. If you use the Aggregation Wizard, this setting may prevent any aggregations from being created, because this setting prevents the wizard from creating aggregations that do not include this attribute.

Do set member and row counts accurately for the partition when designing aggregations

Although member and row counts do not have to be exact, they should be reasonably accurate because they will be used to evaluate the cost, both in disk size and read time, of aggregations. Moreover, these counts will significantly affect the aggregations chosen by the aggregation design algorithm.

You may find that not all members of a dimension exist in a partition, especially when the partition policy is based on the dimension: for example, in time-based partitioning. When this is true, it is important to provide an accurate approximation of the actual count of members found in the partition.

Consider including the granularity attribute of the intermediate dimensions of many-to-many dimensions in aggregations

When accessing data using a many-to-many dimension, the server will always use the granularity attribute of the dimension that serves as an intermediate dimension for that many-to-many dimension. Therefore, only aggregations that contain this granularity attribute will help performance when using many-to-many dimensions. Aggregations on other attributes of the intermediate dimension can still be useful when the intermediate dimension is used directly, and when the many-to-many dimension is not used in a query.

If you have multiple dimensions shared between the measure groups in a many-to-many relationship, it is undefined which dimension the server will use as the intermediate dimension. Therefore, you should consider including the granularity attributes of each such potential intermediate dimension in aggregations.

Do include in aggregations the granularity attribute of dimensions that contain unary operators or custom rollups

Unary operators and custom rollups are calculated at run time from the granularity attribute of the dimension that contains them. Therefore, only aggregations that contain this granularity attribute will help performance when using dimensions with unary operators or custom rollups.

The Aggregation Design Wizard and the Usage-Based Optimization Wizard will automatically include such dimension’s granularity attributes in aggregations. However, if you manually design aggregations, be sure to include such granularity attributes in every aggregation.

Avoid creating aggregations that are larger than one-third the size of the fact data

Aggregations this large generally do not help performance. The Aggregation Design Wizard and Usage-Based Optimization Wizard will apply this rule automatically, but you should monitor the size of aggregations and the size of the fact data when you manually design aggregations.

Do remove aggregation designs that are not applied to any partitions

Remove any aggregation designs that are not used by any partitions. These can slow metadata operations and offer no benefit. Removal of unused aggregations is performed automatically when you use the Copy Aggregation Design dialog box in SQL Server Management Studio. Removal of unused aggregations is also automatic in the Aggregation Design Wizard and the Usage-Based Optimization Wizard. However, if you manually create aggregation designs you should review the design when you have finished and remove unnecessary aggregations.

Do not create aggregations that contain multiple attributes from the same attribute relationship chain

It is redundant to include an attribute in an aggregation if the value of the attribute is implied by an attribute already in the aggregation, because the implied attribute’s value can be calculated from the first attribute. Also, for a given query, only one of these attributes is of interest in the aggregation. The best practice is to remove the implied attribute from the aggregation and, optionally, create a new aggregation that contains the implied attribute but not the implying attribute. The Aggregation Design Wizard and Usage-Based Optimization Wizard will apply this rule automatically, but if you manually design aggregations, you may have to modify your design to remove redundant attributes.

Consider including in aggregations only those attributes with rigid relationships to their dimension keys

Aggregations are considered flexible when any attribute included in the aggregation is related, either directly or indirectly, to the key of its dimension through any attribute relationship that has RelationshipType set to Flexible. Aggregations are also considered flexible when they include a parent-child hierarchy. Because a flexible aggregation might be dropped during a Process Update of a dimension and will only be recreated when Process Index is performed, a better practice is to create rigid aggregations whenever possible, instead of flexible aggregations. Sometimes you can achieve this by selecting an attribute closer to the key of the dimension.

For example, assume the following chain of relationships, where the notation ==> means a rigid relationship and --> means a flexible relationship:

Customer ==> City --> Sales Region ==> Country

Given these relationships, any aggregations that use Sales Region or Country will be flexible, whereas aggregations that use Customer or City may be rigid. To avoid creating a flexible relationship, you might include City in the aggregation instead of Sales Region.

Conclusion

Following the best practices described in this paper will help you avoid many common pitfalls and can help you significantly improve the quality of your OLAP database designs. Further information about working with OLAP in SQL Server Analysis Services 2005 can be found at https://msdn2.microsoft.com/en-us/library/ms175367.aspx.