Divisional portal environment lab study (SharePoint Server 2010)

 

Gilt für: SharePoint Server 2010

Letztes Änderungsdatum des Themas: 2016-11-30

This document provides guidance on performance and capacity planning for a divisional portal based on Microsoft SharePoint Server 2010. It includes the following:

  • Test environment specifications, such as hardware, farm topology and configuration

  • Test farm dataset

  • Test data and recommendations for how to determine the hardware, topology and configuration that you must have to deploy a similar environment, and how to optimize your environment for appropriate capacity and performance characteristics

In this article:

  • Introduction to this environment

  • Glossary

  • Overview

  • Specifications

  • Results and analysis

Introduction to this environment

This document outlines the test methodology and results to provide guidance for capacity planning of a typical divisional portal. A divisional portal is a SharePoint Server 2010 deployment where teams mainly do collaborative activities and some content publishing. This document assumes a "division" to be an organization inside an enterprise with 1,000 to 10,000 employees.

Different scenarios will have different requirements. Therefore, it is important to supplement this guidance with additional testing on your own hardware and in your own environment. If your planned design and workload resembles the environment described in this document, you can use this document to draw conclusions about scaling your environment up and out.

When you read this document, you will understand how to do the following:

  • Estimate the hardware that is required to support the scale that you need to support: number of users, load, and the features enabled.

  • Design your physical and logical topology for optimal reliability and efficiency. High Availability/Disaster Recovery are not covered in this document.

  • Understand the effect of ongoing search crawls on RPS for a divisional portal deployment.

The SharePoint Server 2010 environment described in this document is a lab environment that mimics a production environment at a large company. For details about the production environment, see Departmental collaboration environment technical case study (SharePoint Server 2010).

Before reading this document, make sure that you understand the key concepts behind capacity management in SharePoint Server 2010. The following documentation will help you learn about the recommended approach to capacity management and provide context for helping you understand how to make effective use of the information in this document, and also define the terms used throughout this document.

Also, we encourage you to read the following:

Glossary

There are some specialized terms that you will encounter in this document. Here are some key terms and their definitions.

  • RPS: Requests per second. The number of requests received by a farm or server in one second. This is a common measurement of server and farm load.

    Note that requests differ from page loads; each page contains several components, each of which creates one or more requests when the page is loaded. Therefore, one page load creates several requests. Typically, authentication checks and events using insignificant resources are not counted in RPS measurements.

  • Green Zone: This is the state at which the server can maintain the following set of criteria:

    • The server-side latency for at least 75% of the requests is less than .5 second.

    • All servers have a CPU Utilization of less than 50%.

    Hinweis

    Because this lab environment did not have an active search crawl running, the database server was kept at 40% CPU Utilization or lower, to reserve 10% for the search crawl load. This assumes Microsoft SQL Server Resource Governor is used in production to limit Search crawl load to 10% CPU.

    • Failure rate is less than 0.01%.
  • Red Zone (Max): This is the state at which the server can maintain the following set of criteria:

    • HTTP request throttling feature is enabled, but no 503 errors (Server Busy) are returned.

    • Failure rate is less than 0. 1%.

    • The server-side latency is less than 1 second for at least 75% of the requests.

    • Database server CPU utilization is less than or equal to 75%, which allows for 10% to be reserved for the Search crawl load, limited by using SQL Server Resource Governor.

    • All Web servers have a CPU Utilization of less than or equal to 75%.

  • AxBxC (Graph notation): This is the number of Web servers, application servers, and database servers respectively in a farm. For example, 2x1x1 means that this environment has 2 Web servers, 1 application server, and 1 database server.

  • MDF and LDF: SQL Server physical files. For more information, see Files and Filegroups Architecture.

Overview

This section provides an overview to our assumptions and our test methodology.

Assumptions

For our testing, we made the following assumptions:

  • In the scope of this testing, we did not consider disk I/O as a limiting factor. It is assumed that an infinite number of spindles are available.

  • The tests model only peak time usage on a typical divisional portal. We did not consider cyclical changes in traffic seen with day-night cycles. That also means that timer jobs which generally require scheduled nightly runs are not included in the mix.

  • There is no custom code running on the divisional portal deployment in this case. We cannot guarantee behavior of custom code/third-party solutions installed and running in your divisional portal.

  • For the purpose of these tests, all of the services databases and the content databases were put on the same instance of Microsoft SQL Server. The usage database was maintained on a separate instance of SQL Server.

  • For the purpose of these tests, BLOB cache is enabled.

  • Search crawl traffic is not considered in these tests. But to factor in the effects of an ongoing search crawl, we modified definitions of a healthy farm. (Green-zone definition to be 40 percent for SQL Server to allow for 10 percent tax from Search crawls. Similarly, we used 80 percent SQL Server CPU as the criteria for max RPS.)

Test methodology

We used Visual Studio Team System for Test 2008 SP2 to perform the performance testing. The testing goal was to find the performance characteristic of green zone, max zone and various system stages in between for each topology. Detailed definitions of "max zone" and "green zone" are given in the Glossary as measured by specific values for performance counters, but in general, a farm configuration performing around "max zone" breakpoint can be considered under stress, whereas a farm configuration performing "green zone" breakpoint can be considered healthy.

The test approach was to start by using the most basic farm configuration and run a set of tests. The first test is to gradually increase the load on the system and monitor its performance characteristic. From this test we derived the throughput and latency at various user loads and also identified the system bottleneck. After we had this data, we identified at what user load did the farm exhibit green zone and max zone characteristics. We ran separate tests at those pre-identified constant user loads for a longer time. These tests ensured that the farm configuration can provide constant green zone and max zone performance at respective user loads, over longer period of time.

Later, while doing the tests for the next configuration, we scaled out the system to eliminate bottlenecks identified in previous run. We kept iterating in this manner until we hit SQL Server CPU bottleneck.

We started off with a minimal farm configuration of 1 Web server /application server and 1 database server. Through multiple iterations, we finally ended at 3 Web servers, 1 application server, 1 database server farm configuration, where the database server CPU was maxed out. Below you will find a quick summary and charts of tests we performed on each iteration to establish green zone and max zone for that configuration. That is followed by comparison of green zone and max zone for different iterations, from which we derive our recommendations.

The SharePoint Admin Toolkit team has built a tool that is named "Load Test Toolkit (LTK)" which is publically available for customers to download and use.

Specifications

This section provides detailed information about the hardware, software, topology, and configuration of the lab environment.

Hardware

The table that follows presents hardware specs for the computers that were used in this testing. Every Web server that was added to the server farm during multiple iterations of the test complies with the same specifications.

  Web server Application Server Database Server

Processor(s)

2px4c@2.33GHz

2px4c@2.33GHz

4px4c@ 3.19GHz

RAM

8 GB

8 GB

32 GB

Number of network adapters

2

2

1

Network adapter speed

1 Gigabit

1 gigabit

1 Gigabit

Load balancer type

F5 - Hardware load balancer

Not applicable

Not applicable

ULS Logging level

Medium

Medium

Not applicable

Software

The table that follows explains software installed and running on the servers that were used in this testing effort.

  Web Server Application Server Database Server

Operating System

Windows Server 2008 R2 x64

Windows Server 2008 R2 x64

Windows Server 2008 x64

Software version

SharePoint Server 2010 and Office Web Applications, pre-release versions

SharePoint Server 2010 and Office Web Applications, pre-release versions

SQL Server 2008 R2 CTP3

Authentication

Windows NTLM

Windows NTLM

Windows NTLM

Load balancer type

F5 - Hardware load balancer

Not applicable

Not applicable

ULS Logging level

Medium

Medium

Not applicable

Anti-Virus Settings

Disabled

Disabled

Disabled

Services running locally

Microsoft SharePoint Foundation Incoming E-Mail

Microsoft SharePoint Foundation Web Application

Microsoft SharePoint Foundation Workflow Timer Service

Search Query and Site Settings Service

SharePoint Server Search

Central Administration

Excel Services

Managed Metadata Web Service

Microsoft SharePoint Foundation Incoming E-Mail

Microsoft SharePoint Foundation Web Application

Microsoft SharePoint Foundation Workflow Timer Service

PowerPoint Services

Search Query and Site Settings Service

SharePoint Server Search

Visio Graphics Services

Word Viewing Service

Not applicable

The table indicates which services are provisioned in the test environment. Other services such as the User Profile service and Web Analytics are not provisioned.

Topology and configuration

The following diagram shows the topology used for the tests. We changed the number of Web servers from 1 to 2 to 3, as we moved between iterations, but otherwise the topology remained the same.

Farmtopologiediagramm für diese Umgebung

Dataset and disk geometry

The test farm was populated with about 1.62 Terabytes of content, distributed across five different sized content databases. The following table explains this distribution:

Content database 1 2 3 4 5

Content database size

36 GB

135 GB

175 GB

1.2 terabytes

75 GB

Number of sites

44

74

9

9

222

Number of webs

1544

2308

2242

2041

1178

RAID configuration

0

0

0

0

0

Number of spindles for MDF

1

1

5

3

1

Number of spindles for LDF

1

1

1

1

1

Transactional mix

The following are important notes about the transactional mix:

  • There are no My Sites provisioned on the divisional portal. Also, the User Profile service, which supports My Sites, is not running on the farm. The transactional mix does not include any My Site page/web service hits or traffic related to Outlook Social Connector.

  • The test mix does not include any traffic generated by co-authoring on documents.

  • The test mix does not include traffic from Search Crawl. However this was factored into our tests by modifying the Green-zone definition to be 40 percent SQL Server CPU usage instead of the standard 50 percent to allow for 10 percent for the search crawl. Similarly, we used 80 percent SQL Server CPU as the criteria for max RPS.

The following table describes the overall transaction mix. The percentages total 100.

Feature or Service Operation Read/write Percentage of mix

ECM

Get static files

r

8.93%

 

View home page

r

1.52%

Microsoft InfoPath

Display/Edit upsize list item and new forms

r

0.32%

 

Download file by using "Save as"

r

1.39%

Microsoft OneNote 2010

Open Microsoft Office OneNote 2007 file

r

13.04%

Search

Search through OSSSearch.aspx or SearchCenter

r

4.11%

Workflow

Start autostart workflow

w

0.35%

Microsoft Visio

Render Visio file in PNG/XAML

r

0.90%

Office Web Applications - PowerPoint

Render Microsoft PowerPoint, scroll to 6 slides

r

0.05%

Office Web Applications - Word

Render and scroll Microsoft Word doc in PNG/Silverlight

r

0.24%

Microsoft SharePoint Foundation

List – Check out and then check in an item

w

0.83%

 

List - Get list

r

0.83%

 

List - Outlook sync

r

1.66%

 

List - Get list item changes

r

2.49%

 

List - Update list items and adding new items

w

4.34%

 

Get view and view collection

r

0.22%

 

Get webs

r

1.21%

 

Browse to Access denied page

r

0.07%

 

View Browse to list feeds

r

0.62%

 

Browse to viewlists

r

0.03%

 

Browse to default.aspx (home page)

r

1.70%

 

Browse to Upload doc to doc lib

w

0.05%

 

Browse to List/Library's default view

r

7.16%

 

Delete doc in doclib using DAV

w

0.83%

 

Get doc from doclib using DAV

r

6.44%

 

Lock and Unlock a doc in doclib using DAV

w

3.32%

 

Propfind list by using DAV

r

4.16%

 

Propfind site by using DAV

r

4.16%

 

List document by using FPSE

r

0.91%

 

Upload doc by using FPSE

w

0.91%

 

Browse to all site content page

r

0.03%

 

View RSS feeds of lists or wikis

r

2.03%

Excel Services

Render small/large Excel files

r

1.56%

Workspaces

WXP - Cobalt internal protocol

r

23.00%

 

Full file upload using WXP

w

0.57%

Results and analysis

This section describes the test methodology and results to provide guidance for capacity planning of a typical divisional portal.

Results from 1x1 farm configuration

Summary of results

  • On a 1 Web server and 1 database server farm, in addition to Web server duties, the same computer was also acting as application server. Clearly this computer (still called Web server) was the bottleneck. As presented in the data here, the Web server CPU reached around 86% utilization when the farm was subjected to user load of 125 users by using the transactional mix described earlier in this document. At that point, the farm exhibited max RPS of 101.37.

  • Even at a small user load, Web server utilization was always too high to consider this farm as a healthy farm. For the workload and dataset that we used for the test, we do not recommend this configuration as a real deployment.

  • Going by definition of "green zone", there is not really a "green zone" for this farm. It is always under stress, even at a small load. As for "max zone", at the smallest load, where the farm was in "max zone", the RPS was 75.

  • Because the Web server was the bottleneck due to its dual role as an application server, for the next iteration, we separated out the application server role onto its own computer.

Performance counters and graphs

The following table presents various performance counters captured during testing a 1x1 farm at different steps in user load.

User Load 50 75 100 125

RPS

74.958

89.001

95.79

101.37

Latency

0.42

0.66

0.81

0.81

Web server CPU

79.6

80.1

89.9

86

Application server CPU

N/A

N/A

N/A

N/A

Database server CPU

15.1

18.2

18.6

18.1

75th Percentile (sec)

0.3

0.35

0.55

0.59

95th Percentile (sec)

0.71

0.77

1.03

1

The following chart shows the RPS and latency results for a 1x1 configuration.

Diagramm mit Anforderungen pro Sekunde und Wartezeit in der Skalierung 1x1

The following chart shows performance counter data in a 1x1 configuration.

Diagramm mit Leistungsindikatoren in der Skalierung 1x1

Results from 1x1x1 farm configuration

Summary of results

  • On a 1 Web server, 1 application server and 1 database server farm, the Web server was the bottleneck. As presented in the data in this section, the Web server CPU reached around 85% utilization when the farm was subjected to user load of 150 users by using the transactional mix described earlier in this document. At that point, the farm exhibited max RPS of 124.1.

  • This configuration delivered "green zone" RPS of 99, with 75th percentile latency being 0.23 sec, and the Web server CPU hovering around 56 % utilization. This indicates that this farm can healthily deliver an RPS of around 99. "Max zone" RPS delivered by this farm was 123 with latencies of 0.25 sec and the Web server CPU hovering around 85%.

  • Because the Web server CPU was the bottleneck in this iteration, we relived the bottleneck by adding another the Web server for the next iteration.

Performance counters and graphs

The following table presents various performance counters captured during testing a 1x1x1 farm, at different steps in user load.

User Load 25 50 75 100 125 150

RPS

53.38

91.8

112.2

123.25

123.25

124.1

Latency

34.2

56

71.7

81.5

84.5

84.9

Web server CPU

23.2

33.8

34.4

32

30.9

35.8

Application server CPU

12.9

19.7

24.1

25.2

23.8

40.9

Database server CPU

0.22

0.23

0.27

0.32

0.35

0.42

75th Percentile (sec)

0.54

0.52

0.68

0.71

0.74

0.88

The following chart shows RPS and latency results for a 1x1x1 configuration.

Diagramm mit Anforderungen pro Sekunde und Wartezeit in der Skalierung 1x1x1

The following chart shows performance counter data in a 1x1x1 configuration.

Diagramm mit Leistungsindikatoren in der Skalierung 1x1x1

Results from 2x1x1 farm configuration

Summary of results

  • On a 2 Web server, 1 application server and 1 database server farm, the Web server was the bottleneck. As presented in the data in this section, Web server CPU reached around 76% utilization when the farm was subjected to user load of 200 users by using the transactional mix described earlier in this document. At that point, the farm exhibited max RPS of 318.

  • This configuration delivered "green zone" RPS of 191, with 75th percentile latency being 0.37 sec, and Web server CPU hovering around 47 % utilization. This indicates that this farm can healthily deliver an RPS of around 191. "Max zone" RPS delivered by this farm was 291 with latencies of 0.5 sec and Web server CPU hovering around 75%.

  • Because the Web server CPU was the bottleneck in this iteration, we relived the bottleneck by adding another Web server for the next iteration.

Performance counters and graphs

The following table presents various performance counters captured during testing a 2x1x1 farm, at different steps in user load.

User Load 40 80 115 150 175 200

RPS

109

190

251

287

304

318

Latency

0.32

0.37

0.42

0.49

0.54

0.59

Web server CPU

27.5

47.3

61.5

66.9

73.8

76.2

Application server CPU

17.6

29.7

34.7

38

45

45.9

Database server CPU

21.2

36.1

43.7

48.5

52.8

56.2

75th Percentile (sec)

0.205

0.23

0.27

0.3

0.305

0.305

95th Percentile (sec)

0.535

0.57

0.625

0.745

0.645

0.57

The following chart shows RPS and latency results for a 2x1x1 configuration.

Diagramm mit Anforderungen pro Sekunde und Wartezeit in der Skalierung 2x1x1

The following chart shows performance counter data in a 2x1x1 configuration.

Diagramm mit Leistungsindikatoren in der Skalierung 2x1x1

Results from 3x1x1 farm configuration

Summary of results

  • On a 3 Web server, 1 application server and 1 database server farm, finally, the database server CPU was the bottleneck. As presented in the data in this section, database server CPU reached around 76% utilization when the farm was subjected to user load of 226 users by using the transactional mix described earlier in this document. At that point, the farm exhibited max RPS of 310.

  • This configuration delivered "green zone" RPS of 242, with 75th percentile latency being 0.41 sec, and database server CPU hovering around 44% utilization. This indicates that this farm can healthily deliver an RPS of around 242. "Max zone" RPS delivered by this farm was 318 with latencies of 0.5 sec and database server CPU hovering around 75%.

  • This was the last configuration in the series.

Performance counters and graphs

The following table presents various performance counters captured during testing a 3x1x1 farm, at different steps in user load.

User Load 66 103 141 17 202 226

RPS

193.8

218.5

269.8

275.5

318.25

310

Latency

0.3

0.41

0.47

0.58

0.54

0.78

Web server CPU

33

38.3

45.8

43.3

51

42.5

Application server CPU

28

32.6

46.5

40

45.1

43.7

Database server CPU

41.6

44.2

52.6

48

61.8

75

75th Percentile (sec)

0.22

0.24

0.30

0.65

0.78

0.87

95th Percentile (sec)

0.49

0.57

0.72

1.49

0.51

1.43

The following chart shows RPS and latency results in a 3x1x1 configuration.

Diagramm mit Anforderungen pro Sekunde und Wartezeit in der Skalierung 3x1x1

The following chart shows performance counter data for a 3x1x1 configuration.

Diagramm mit Leistungsindikatoren in der Skalierung 3x1x1

Comparison

From the iterative tests we performed, we found out the points at which a configuration enters max zone or green zone. Here’s a table of those points.

The table and charts in this section provide a summary for all the results that were presented earlier in this article.

Topology 1x1 1x1x1 2x1x1 3x1x1

Max RPS

75

123

291

318

Green Zone RPS

Not applicable

99

191

242

Max Latency

0.29

0.25

0.5

0.5

Green Zone Latency

0.23

0.23

0.37

0.41

The following chart shows a summary of RPS at different configurations.

Diagramm mit einem Vergleich der Anforderungen pro Sekunde in jeder Skalierung

The following chart shows a summary of latency at different configurations.

Vergleich der Wartezeit in allen Skalierungen

A note on disk I/O

Disk I/O based bottlenecks are not considered while prescribing recommendations in this document. However, it is still interesting to observe the trend. Here are the numbers:

Configuration 1x1 1x1x1 2x1x1 3x1x1

Max RPS

75

154

291

318

Reads/Sec

38

34

54

58

Writes/Sec

135

115

230

270

Because we ran the tests in durations of 1 hour and the test uses only a fixed set of sites/webs/document libraries and so on, SQL Server could cache all the data. Thus, our testing caused very little Read IO. We see more write I/O operations that read. It is important to be aware that this is an artifact of the test methodology, and not a good representation of real deployments. Most of the typical divisional portals would have more read operations 3 to 4 times more than write operations.

The following chart shows I/Ops at different RPS.

Diagramm mit IOPS in allen Skalierungen

Tests with Search incremental crawl

As we mentioned before, all the tests until now were run without Search crawl traffic. In order to provide information about how ongoing search crawl can affect performance of a farm, we decided to find out the max user RPS and corresponding user latencies with search crawl traffic in the mix. We added a separate Web server to 3x1x1 farm, designated as a crawl target. We saw a 17% drop in RPS compared to original RPS exhibited by 3x1x1.

In a separate test, on the same farm, we used Resource Governor to limit available resources to search crawl 10%. We saw that as Search uses lesser resources, the max RPS of the farm climbs up by 6%.

  Baseline 3x1x1 Only Incremental Crawl No Resource Governor 10% Resource Governor

RPS

318

N/A

276

294.5

Percent RPS difference from baseline

0%

N/A

83%

88%

Database server CPU (%)

83.40

8.00

86.60

87.3

SA Database server CPU (%)

3.16

2.13

3.88

4.2

Web server CPU (%)

53.40

0.30

47.00

46.5

Application server CPU (%)

22.10

28.60

48.00

41.3

Crawl Web server CPU (%)

0.50

16.50

15.00

12.1

The following chart shows results from tests with incremental Search crawl turned on.

Anforderungen pro Sekunde bei ausgeführter Suche

Wichtig

Here we are only talking about incremental crawl, on a farm where there are not very many changes to the content. It is important to be aware that 10% resource utilization will be insufficient for a full search crawl. It may also prove to be less if there are too many changes. It is definitely not advised to limit resource utilization to 10% if you are running a full search crawl, or your farm generally sees a high volume of content changes between crawls.

Summary of results and recommendations

To paraphrase the results from all configurations we tested:

  • With the configuration, dataset and test workload we had selected for the tests, we could scale out to maximum 3 Web servers before SQL Server was bottlenecked on CPU. The absolute max RPS we could reach that point was somewhere around 318.

  • With each additional Web server, increase in RPS was almost linear. We can extrapolate that as long as SQL Server is not bottlenecked, you can add more Web servers and additional increase in RPS is possible.

  • Latencies are not affected much as we approached bottleneck on SQL Server.

  • Incremental Search crawl directly affects RPS offered by a configuration. The effect can be minimized by using Resource Governor.

Using the results, here are few recommendations on how to achieve even larger scale if you must have more RPS from your divisional portal:

  • 1x1 farm can deliver up to 75 RPS. However, it is usually stressed. It’s not a recommended configuration for a divisional portal in production.

  • Separate out content databases and services databases on separate instances of SQL Server: In the test workload used in tests, when SQL Server was bottlenecked on CPU, only 3% of the traffic was to the services databases. Thus this step would have achieved slightly better scale out than what we saw. But, in general, increase in scale out by separating out content databases and services databases is directly proportional to the traffic to services database in your farm.

  • Separate out individual content databases on separate instances of SQL Server. In the dataset used for testing, we had 5 content databases, all located on the same instance of SQL Server. By separating them out on different computers, you will be spreading CPU utilization across multiple computers. Therefore, you will see much larger RPS numbers.

  • Finally when SQL Server is bottlenecked on CPU, adding more CPU to SQL Server can increase RPS potential of the farm almost linearly.

How to translate these results into your deployment

In this article, we discussed results as measured by RPS and latency, but how do you apply these in the real world? Here’s some math based on our experience from divisional portal internal to Microsoft.

A divisional portal in Microsoft which supports around 8000 employees collaborating heavily, experiences an average RPS of 110. That gives a Users to RPS ratio of ~72 (that is, 8000/110). Using the ratio, and the results discussed earlier in this article, we can estimate how many users a particular farm configuration can support healthily:

Farm configuration "Green Zone" RPS Approximate number of users it can support

1x1x1

99

7128

2x1x1

191

13452

3x1x1

242

17424

Of course, this is only directly applicable if your transactional mix and hardware is exactly same as the one used for these tests. Your divisional portal may have different usage pattern. Therefore, the ratio may not directly apply. However, we expect it to be approximately applicable.

About the authors

Gaurav Doshi is a Program Manager for SharePoint Server at Microsoft.

Raj Dhrolia is a Software Test Engineer for SharePoint Server at Microsoft.

Wayne Roseberry is a Principal Test Lead for SharePoint Server at Microsoft.

See Also

Other Resources

Resource Center: Capacity Management for SharePoint Server 2010