MOSS Performance counters

Following are some of the Performance counters that could come handy while doing the Performance/Load testing on your Sharepoint 2007 or MOSS environment.

Also Codeplex has a nice tool/scripts for doing the performance testing you might want to leverage that: http://www.codeplex.com/sptdatapop 

 

Recommended Performance Counters

Server Role

Performance Counter

Web Front End (WFE)

Query

Indexer

Database

Guideline

Explaination

Memory\Available Megabytes

X

X

X

X

Values < 10% of total physical memory, even for short periods, indicates need for additional RAM.

Available MBytes is the amount of physical memory, in Megabytes, immediately available for allocation to a process or for system use. It is equal to the sum of memory assigned to the standby (cached), free and zero page lists. For a full explanation of the memory manager, refer to MSDN and/or the System Performance and Troubleshooting Guide chapter in the Windows Server 2003 Resource Kit.

Memory\% Committed Bytes in Use

X

X

X

X

Determine trend over time by base lining.

% Committed Bytes In Use is the ratio of Memory\\Committed Bytes to the Memory\\Commit Limit. Committed memory is the physical memory in use for which space has been reserved in the paging file should it need to be written to disk. The commit limit is determined by the size of the paging file. If the paging file is enlarged, the commit limit increases, and the ratio is reduced). This counter displays the current percentage value only; it is not an average.

Memory\Pages/second

X

X

X

X

Consistently > 200 indicates need for additional physical RAM.

Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\\Pages Input/sec and Memory\\Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.

Memory\Page Faults/Sec

X

X

X

X

Monitor long term trends in conjunction with the Memory\Pages/sec

Page Faults/sec is the average number of pages faulted per second. It is measured in number of pages faulted per second because only one page is faulted in each fault operation, hence this is also equal to the number of page fault operations. This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers of soft faults without significant consequence. However, hard faults, which require disk access, can cause significant delays.

Memory\Pool Nonpaged Bytes

X

X

X

X

Monitor in combination with Available Bytes. Large values can cause IIS to stop responding.

Pool Nonpaged Bytes is the size, in bytes, of the nonpaged pool, an area of system memory (physical memory used by the operating system) for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated. Memory\\Pool Nonpaged Bytes is calculated differently than Process\\Pool Nonpaged Bytes, so it might not equal Process\\Pool Nonpaged Bytes\\_Total. This counter displays the last observed value only; it is not an average.

System\Processor Queue Length

X

X

X

X

Sustained values > 2 X # of CPUs indicate need for upgraded CPUs, additional L2 cache, additional processors, and/or scaling out.

Processor Queue Length is the number of threads in the processor queue. Unlike the disk counters, this counter counters, this counter shows ready threads only, not threads that are running. There is a single queue for processor time even on computers with multiple processors. Therefore, if a computer has multiple processors, you need to divide this value by the number of processors servicing the workload. A sustained processor queue of less than 10 threads per processor is normally acceptable, dependent of the workload.

Network Interface\Bytes Total/sec

X

X

X

X

Should not exceed Network Card Speed X 2 (for duplex) X 75%. Monitor trend over time.

Bytes Total/sec is the rate at which bytes are sent and received over each network adapter, including framing characters.

Network Interface\Packets/sec

X

X

X

X

Monitor over time. May indicate a processor bottleneck handling processing packets.

Packets/sec is the rate at which packets are sent and received on the network interface.

PhysicalDisk\% Disk Time\Drive Letter

X

X

X

X

Value > 80% may indicate lack of RAM or a disk controller issue.

% Disk Time is the percentage of elapsed time that the selected disk drive was busy servicing read or write requests.

PhysicalDisk\Avg. Disk sec/Transfer\Drive Letter

X

X

X

X

Values > 0.3 may indicate disk controller or drive problems.

Avg. Disk sec/Transfer is the time, in seconds, of the average disk transfer.

PhysicalDisk\Current Disk Queue Length\Drive Letter

X

X

X

X

Sustained values > 2 X # of spindles may indicate need for disk upgrade

Current Disk Queue Length is the number of requests outstanding on the disk at the time the performance data is collected. It also includes requests in service at the time of the collection. This is a instantaneous snapshot, not an average over the time interval. Multi-spindle disk devices can have multiple requests that are active at one time, but other concurrent requests are awaiting service. This counter might reflect a transitory high or low queue length, but if there is a sustained load on the disk drive, it is likely that this will be consistently high. Requests experience delays proportional to the length of this queue minus the number of spindles on the disks. For good performance, this difference

Processor\%Processor Time\_Total

X

X

X

X

Consistently > 75% indicates need for addition processing power.

% Processor Time is the percentage of elapsed time that the processor spends to execute a non-Idle thread. It is calculated by measuring the duration of the idle thread is active in the sample interval, and subtracting that time from interval duration. (Each processor has an idle thread that consumes cycles when no other threads are ready to run). This counter is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. It is calculated by monitoring the time that the service is inactive, and subtracting that value from 100%.

Processor\Interrupts/sec

X

X

X

X

Values < 1000 considered good. Monitor for trends over time. May indicate failing hardware.

Interrupts/sec is the average rate, in incidents per second, at which the processor received and serviced hardware interrupts. It does not include deferred procedure calls (DPCs), which are counted separately. This value is an indirect indicator of the activity of devices that generate interrupts, such as the system clock, the mouse, disk drivers, data communication lines, network interface cards, and other peripheral devices. These devices normally interrupt the processor when they have completed a task or require attention. Normal thread execution is suspended. The system clock typically interrupts the processor every 10 milliseconds, creating a background of interrupt activity. This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval.

Redirector\Server Sessions Hung

X

X

X

X

Value > 1 indicates remote server too busy. Compare to memory and processor counters on the remote server to determine root cause.

Server Sessions Hung counts the number of active sessions that are timed out and unable to proceed due to a lack of response from the remote server.

SAN Monitoring

N/A

N/A

N/A

N/A

Follow vendor's guidelines

Use Vendor tools if available; otherwise, follow the Storage team recommendation.

Server\Work Item Shortages

X

X

X

X

Value > 3 may indicate need to increase value of InitWorkItems or MaxWorkItem in the registry.

The number of times STATUS_DATA_NOT_ACCEPTED was returned at receive indication time. This occurs when no work item is available or can be allocated to service the incoming request. Indicates whether the InitWorkItems or MaxWorkItems parameters might need to be adjusted.

Web Service\Connection Attempts/sec\_Total

X

X

X

No absolute number. Determine trend over time by baselining.

The rate that connections to the Web service are being attempted.

Process (w3wp)\%Processor Time

X

Determine trend over time by baselining.

% Processor Time is the percentage of elapsed time that all of process threads used the processor to execution instructions. An instruction is the basic unit of execution in a computer, a thread is the object that executes instructions, and a process is the object created when a program is run. Code executed to handle some hardware interrupts and trap conditions are included in this count.

Process (w3wp)\Private Bytes

X

No absolute number. Determine trend over time by baselining. May indicate memory leaking.

Private Bytes is the current size, in bytes, of memory that this process has allocated that cannot be shared with other processes.

ASP.NET Applications\Request/sec\_Total

X

No absolute number. Determine trend over time by baselining.

The number of requests executed per second.

ASP.NET \Worker Processes Restarts

X

Values > 0 may indicate problems.

Number of times a worker process has restarted on the machine.

.NET CLR Memory\% Time in GC

X

Values > 25% may indicate poorly written code.

% Time in GC is the percentage of elapsed time that was spent in performing a garbage collection (GC) since the last GC cycle. This counter is usually an indicator of the work done by the Garbage Collector on behalf of the application to collect and compact memory. This counter is updated only at the end of every GC and the counter value reflects the last observed value; its not an average.

Office Server Search Indexer Catalogs\Queries

X

Monitor trend. Compare to CPU and memory to determin growth capacity.

The number of queries

Office Server Search Gatherer Projects(2~Portal_Content)\Crawls in progress

X

Determine trend over time by baselining. Compare to crawl schedules.

Number of crawls in progress.

Office Server Search Gatherer Projects(2~Portal_Content)\Document Add Rate

X

Determine trend over time by baselining against similar content sources.

The number of document additions per second.

Office Server Search Gatherer Projects(2~Portal_Content)\Error Rate

X

Determine trend over time by baselining.

The number of file protocol errors received while getting documents.

Office Server Search Gatherer Projects(2~Portal_Content)\Incremental Crawls

X

Determine trend over time by baselining.

Number of incremental crawls in progress.

Office Server Search Gatherer Projects(2~Portal_Content)\Processed Documents Rate

X

Determine trend over time by baselining.

The number of documents processed since the history has been reset.

Office Server Search Gatherer Projects(2~Portal_Content)\Retries

X

Determine trend over time by baselining.

The total number of times a document access has been retried. Having this number high may indicate a problem with accessing the data.

Office Server Search Gatherer Projects(2~Portal_Content)\Waiting Documents

X

Determine trend over time by baselining.

The number of documents waiting to be processed. When this number goes to zero the catalog is idle. This number indicates the total queue size of unprocessed documents in the gatherer.

Office Server Search Gatherer\Documents Filtered Rate

X

Determine trend over time by baselining.

The number of documents filtered per second.

Office Server Search Gatherer\Filtering Threads

X

Determine trend over time by baselining.

The total number of filtering threads in the system. This number is calculated based on your system resources.

Office Server Search Gatherer\Threads Accessing Network

X

Determine trend over time by baselining.

The number of threads waiting for a response from the filter process. If no activity is going on and this number is equal to number of filtering threads, it may indicate a network problem or unavailability of the server it is crawling.

SharePoint Search Gatherer\Document Entries

X

Determine trend over time by baselining. Compare to crawl schedules.

The number of document entries currently in memory. Zero means no indexing activity is going on.

Process (sqlserver)\% Processor Time

X

Value > 80 to 90 percent may indicate the need to upgrade the CPUs or add more processors.

% Processor Time is the percentage of elapsed time that all of process threads used the processor to execution instructions. An instruction is the basic unit of execution in a computer, a thread is the object that executes instructions, and a process is the object created when a program is run. Code executed to handle some hardware interrupts and trap conditions are included in this count.

Process (sqlserver)\Private Bytes

X

No absolute number. Determine trend over time by baselining. May indicate memory leaking.

Private Bytes is the current size, in bytes, of memory that this process has allocated that cannot be shared with other processes.

Process (sqlserver)\Working Set

X

If this number is consistently below the amount of memory that is set by the min server memory and max server memory server options, SQL Server is configured to use too much memory.

Shows the maximum size, in bytes, in the working set of this process at any point in time. The working set is the set of memory pages touched recently by the threads in the process. If free memory in the computer is above a certain threshold, pages are left in the working set of a process even if they are not in use. When free memory falls below a certain threshold, pages are trimmed from working sets. If they are needed, they are then soft-faulted back into the working set before they leave main memory.

SQL Server: Buffer Manager\Buffer Cache Hit Ratio

X

The Buffer Cache Hit Ratio counter is specific to an application. However, a rate of 90 percent or higher is desirable. Add more memory until the value is consistently greater than 90 percent.

Percentage of pages found in the buffer cache without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups since an instance of SQL Server was started. After a long period of time, the ratio moves very little. Because reading from the cache is much less expensive than reading from disk, you want this ratio to be high. Generally, you can increase the buffer cache hit ratio by increasing the amount of memory available to SQL Server.

SQL Server: Databases \Data File(s) Size (KB) \ SSP_Search_database

X

Monitor to ensure adequate disk space for search database

Cumulative size (in kilobytes) of all the data files in the database including any automatic growth. Monitoring this counter is useful, for example, for determining the correct size of tempdb.

SQL Server: Databases \Data File(s) Size (KB) \ tempdb

X

Monitor to ensure adequate disk space for tempdb.

Cumulative size (in kilobytes) of all the data files in the database including any automatic growth. Monitoring this counter is useful, for example, for determining the correct size of tempdb.

SQL Server: Databases \Log File(s) Size (KB) \ SSP_Search_database

X

Monitor to ensure adequate disk space for search database

Cumulative size (in kilobytes) of all the transaction log files in the database.

SQL Server: Databases \Log File(s) Size (KB) \ tempdb

X

Monitor to ensure adequate disk space for tempdb.

Cumulative size (in kilobytes) of all the transaction log files in the database.

SQLServer: Transactions\Free Space in tempdb (KB)

X

Monitor to ensure adequate disk space for tempdb.

The amount of space (in kilobytes) available in tempdb. There must be enough free space to hold both the snapshot isolation level version store and all new temporary objects created in this instance of the Database Engine.

 

Also dont forget to Read Steve Sheppard's blog, he has an excellent point on Object Cache monitoring:-

 http://blogs.msdn.com/steveshe/archive/2009/03/12/how-do-i-tune-the-moss-object-cache-for-performance-and-economy.aspx

 

Following is taken from Steve Sheppard's blog, make sure to visit his blog:

How do I tune the MOSS Object Cache for performance and economy?

Tuning the size of the MOSS Object Cache is done via the Max Cache Size settings on the Site Settings > Object Cache Settings page. It is important to recognize that this maximum cache size setting is a limit and not a static value. For example; Just because the maximum cache size setting for the Object Cache is configured to 100MB does not mean that it will consume 100MB of memory at startup. It simply means that if the cache were to exceed 100MB it will be compacted to reduce its memory consumption to a level below that maximum value.

For purposes of this discussion we will categorize cache compaction rates into three categories, unacceptable, acceptable and optimal. A cache compaction rate of more than 6 per hour should be considered unacceptable. A rate of between 2 and 6 compactions per hour is acceptable and between 0 and 1 cache compactions per hour is optimal. You should only target the optimal level of cache compactions if you have sufficient amount physical memory installed on your servers to achieve this goal and still have sufficient free physical memory remaining on the server to support other critical system operations.

You should regularly monitor for cache flushes because they are extremely expensive in terms of cache performance impact. A cache flush results in the ejection of all cache contents. Cache flushes are triggered by the creation, deletion or moving of a web. They can be monitored via the "SharePoint Publishing Cache/Total number of cache flushes" counter in Performance Monitor. If you are seeing cache flushes throughout the production day you should reconsider how you manage webs during those hours.

Based on these considerations we have developed the following guidance for how to tune the Object Cache size for acceptable performance using the "SharePoint Publishing Cache / Total number of cache compactions" performance counter.

We feel this level of cache performance will meet an economical customers performance needs without unduly sacrificing their limited memory resources. The steps to achieve this are fairly straightforward and must be applied in an iterative fashion until the desired level of performance is achieved. The recommended steps are:

1. Start with the default cache settings of 100MB on the site collection.

2. Capture at least 8 hours worth of performance data from the WFEs while the system is under a typical load using the "SharePoint Publishing Cache/Total number of cache compactions" counter. Since we are interested in tracking compactions per hour it is acceptable to capture this data at 1 minute intervals.

3. After analysis of the data, if we are exceeding the threshold for acceptable cache compactions we will need to add an additional 50MB to the Maximum Cache Size value and run the test again.

4. We should continue this process until such time as we have achieved an acceptable cache compaction rate.