Monitoring SharePoint Portal Server 2001

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

This is a reprint of Chapter 12 from The Administrator's Guide to SharePoint Portal Server 2001 , published by Addison-Wesley.

Like architecture, monitoring is viewed by many people as both boring and optional. Although I can see the first assessment, I can't agree with the second.

Monitoring your servers should not be optional. Instead, it should be one of the main focuses of your administrative function matrix. Monitoring is an important part of delivering a high-quality knowledge management solution to your organization and for achieving the commitments of Service Level Agreements.

Why monitor SPS? The answer is simple: to avoid service outages by detecting problems before they become critical. Monitoring can help you quickly make adjustments when user demand significantly changes or your server resources become overstressed. Monitoring your SPS server can yield information in the following areas:

  • Overseeing overall system health

  • Detecting and predicting trends

  • Detecting errors

  • Ensuring good backups of your servers data

This chapter provides information on how to monitor SPS on a Windows 2000 server.

On This Page

Developing Monitoring Policies
Microsoft Operations Manager
Overseeing Overall Server Health
Monitoring the Major Processes of SharePoint Portal Server
Monitoring the Web Storage System
Getting All Stressed Out with SharePoint Portal Server
Summary

Developing Monitoring Policies

You've probably heard variations of the worn-out phrase used by financial planners in the last few decades: "If you fail to plan, you're planning to fail." As cheesy as this phrase might be, there is a truckload of truth in it, especially when it comes to monitoring. No matter which tools you use to monitor SPS— Windows 2000, NetIQ, Microsoft Operations Manager (MOM)—if you don't use them properly or regularly, they become meaningless. What is required, when it comes to monitoring, is discipline. And you can establish such discipline by creating monitoring policies that are a result of discussions between you, your server team, and the managers in your company.

Monitoring policies should define the following:

  • Objects to be monitored

  • Servers to be monitored

  • Polling frequency

  • Actions to be taken given certain events

  • A 24/7 plan for notifications and resolutions

Problems will occur with SharePoint Portal Server. This doesn't mean that it's a shoddy product; problems occur with all software packages. How quickly these problems are solved depends largely on how early they are detected and diagnosed. For example, if you don't monitor the available disk space on the physical disks that hold your Web Storage System databases, you could be surprised one day to learn that you've run out of disk space. On the other hand, monitoring could tell you when you have 100MB of disk space left and give you time to move the databases to a larger disk or install a larger disk and move them in a planned, thoughtful manner. Reacting to problems after they occur is called putting out a fire. Preventing a problem from occurring is called good administration. Monitoring will help you with good administration.

Be sure that you spend time with your manager and other team members mapping out monitoring policies. Then implement them after they have been approved. Often, such policies will result in better management of your system and will help you avoid having to put out fires.

Microsoft Operations Manager

New to the suite of server products from Microsoft is Microsoft Operations Manager.

MOM offers capabilities not previously found in the base operating systems:

Support for monitoring of your Windows 2000 systems enterprise-wide from a single console

  • Centralized collection of event data

  • Event-driven alerts

  • Modeling and trends development based on past experience

  • Generation of management reports based on a criteria mix you select

MOM supports both Windows NT 4.0 and Windows 2000 servers.

Microsoft is committed to ensuring that as new server products ship, there will be a corresponding management pack or set of counters that will integrate the MOM capabilities with each new product. The ability to monitor and manage an enterprise-wide set of servers from a single console is one of the key selling features of this new product.

MOM also contains built-in timescales, allowing you to meet any Service Level Agreements (SLAs) you might have with the departments you support.

When problems are detected and reported, MOM contains numerous links to Knowledge Base articles on Microsoft's Web site to help you troubleshoot and solve the problem.

MOM can be installed on more than one server to provide fault tolerance and redundancy in the event one MOM server becomes disabled. When you're tracking events and trends, this is a business imperative.

There is no management pack for SPS at the time of this writing. However, I want to mention MOM so that you'll know to keep an eye out for the SPS management pack and use it to help monitor all the SharePoint servers in your organization.

Overseeing Overall Server Health

This section doesn't go over how the System Monitor tool works. (If you need help with this, please read the relevant sections in the Windows 2000 Resource Kit.) Instead, this section outlines various mixes of counters that will give you a starting point in getting to know about your servers health. A good understanding of SharePoints architecture will aid in your understanding of how to monitor this system. If you haven't read Chapter 2, it's a good idea to go back and read it first.

It seems to me that one big reason most administrators don't do much monitoring is that they don't understand what each counter means and how it relates to the larger operating system. This betrays a lack of understanding of how the pieces of the OS architecture work together to produce the functionality of a program.

If you can understand a program's architecture, you're more likely to use monitoring tools to learn about the current functioning of each part of the program.

Hence, this chapter is structured around the SPS architecture described in Chapter 2. Figure 12–1 gives you another look at the SPS architecture.

Cc750043.awsp1201(en-us,TechNet.10).gif

Figure 12: –1 SharePoint Portal Server architecture

Before we discuss monitoring each part of SPS, let me make some general comments about overall monitoring. Which counters can give you an overall sense about how SPS is functioning? To get a "big picture" view of your server's overall health, consider these counters to be an excellent method of gaining baseline data:

  • Processor—% Processor Time

  • Memory—% Committed Bytes in Use

  • Active Server Pages—ASP Request Execution Time

  • Active Server Pages—ASP Request Wait Time

  • Active Server Pages—ASP Requests Queued

  • Microsoft Gatherer—Documents Filtered

  • Microsoft Gatherer—Documents Successfully Filtered (if there is a large difference between this counter and the Documents Filtered counter, use the gatherer logs to discern why so many documents are attempting to be filtered but are failing)

  • Microsoft Gatherer—Documents Delayed Retry

  • Microsoft Gatherer—Reasons to Back Off

  • Microsoft Gatherer—Server Objects

  • Microsoft Gatherer—Time-outs

  • Microsoft Gatherer—Adaptive Crawl Accepts

  • Microsoft Gatherer—Adaptive Crawl Errors

  • Microsoft Gatherer—Adaptive Crawl Error Samples

  • Microsoft Gatherer—Adaptive Crawl Excludes

  • Microsoft Gatherer—Adaptive Crawl False Positives

  • Microsoft Gatherer—Adaptive Crawl Total

  • Microsoft Gatherer Projects—Crawls in Progress

  • Microsoft Gatherer Projects—Status Success

  • Microsoft Gatherer Projects—Status Error

  • Microsoft Gatherer Projects—URLs in History (remember that every URL that is crawled is recorded in the gatherer log and is referenced again when the next crawl begins)

  • Microsoft Gatherer Projects—Waiting Documents

  • Microsoft Search—Failed Queries

  • Microsoft Search—Successful Queries

  • Microsoft Search Indexer Catalogs—Merge Process 0–100%

  • Microsoft Search Indexer Catalogs—Number of Documents

  • Microsoft Search Indexer Catalogs—Index Size

The Processor—% Processor Time counter measures how much time, expressed as a percentage, your processor is busy executing non-idle threads.

When not busy, the system will give your processor a thread to loop through that is considered an idle thread. When a thread of code needs to be executed, this idle thread is replaced with an active (non-idle) thread and is executed.

Figure 12–2 shows how busy one processor was on a server that was accepting documents into a workspace. Over a period of 40 hours in which text files were being continually added to the workspace (less than half the total time it took), the average use of the processor was only 14.8%. Incidentally, the test machine has a Pentium III processor, 733MHz, with 512MB of RAM.

Cc750043.awsp1202(en-us,TechNet.10).gif

Figure 12: –2 Processor activity while copying files to a workspace

The Memory—% Committed Bytes is a ratio of two other counters: Memory—Committed Bytes and Memory—Commit Limit. Committed memory is the physical memory in use for which space has been reserved in the paging file should that memory ever need to be written to disk. The commit limit is determined by the size of the paging file and can be increased only by increasing the size of the paging file. Hence, a value of 50 for the Memory—% Committed Bytes counter would mean that of the amount of memory in the paging file that is reserved for data held in RAM, 50% has been used relative to the total amount that is allowed in the page file, as defined by the commit limit. You can see in Figure 12–3 that while files were being copied to the test server, an average of 41.1% of the space allocated to the paging file for data held in RAM was in use.

The Active Server Pages—ASP Request Execution Time counter measures, in milliseconds, how long it took to service the most recent request. This is not an average, but only a measure of the execution time of the last request.

Similarly, the Request Wait Time and the Requests Queued counters measure only the most recent transaction, not an overall average. The Request Wait Time counter measures, in milliseconds, how long the most recent request had to sit in the queue before being processed. The Requests Queued counter reports how many requests at any given time are sitting in the queue, waiting to be processed.

When monitored in real time, these three counters can be used in the Alert mode to help you understand when your processor is beginning to be overloaded. If these counters become high (when measured against your baseline), you should see whether a faster or second processor is in order.

Cc750043.awsp1203(en-us,TechNet.10).gif

Figure 12: –3 Committed bytes in use in the pagefile.sys file

Monitoring the Major Processes of SharePoint Portal Server

This section uses the architecture of SPS to discuss the various objects and counters. I've divided this section into two parts: SPS server services and Internet Information Services (IIS). Each section outlines the counters that are available for monitoring. Let's start by looking at the portal services, then the client services, and then IIS services.

From an overall perspective, here are the objects and the parts of the architecture that they monitor:

  • SharePoint Portal Server Document Management Server,

    msdmserv.exe

: provides counters for all document management functions, such as check-in, check-out, publishing, and approving

  • SharePoint Portal Server Subscriptions: works with

    mssearch.exe 

to provide counters on all subscription activities through the persistent query service (PQS) plug-in

  • Microsoft Gatherer: provides counters that monitor the activities of the gatherer process when crawling local documents and can report IFilter and protocol handler errors

  • Microsoft Gatherer Projects: provides counters that monitor the activities of the gatherer process when crawling external documents and can report IFilter and protocol handler errors

  • Microsoft Search: provides counters for search activities

  • Microsoft Search Catalogs: provides counters for the search catalogs

  • Microsoft Search Indexer Catalogs: provides counters for the indexer catalogs

  • MSExchange OLEDB Resource: Provides counters for measuring interactions of other processes with the Web Storage System; does not provide an exhaustive range of counters that tell you how the Web store is functioning

  • MSExchange OLEDB Events: provides counters of OLEDB-specific events

  • MSExchange Web Mail: installs with the WSS but is not relevant to SPS

SharePoint Portal Server Services

This section describes three main services that you might want to monitor: the document management (DM) services, the Web Storage System (WSS), and the Search service (MSSearch).

Monitoring Document Management Functions

When you monitor DM functions, there are a few counters that you may want to pay attention to. For every counter, there is also a latency counter that measures the time required to perform the operation. Hence, you should be sure to monitor the latency counters: Successful Checkins Latency, Successful Checkouts Latency, and so forth. These counters give you, in milliseconds, how long it is taking to perform a function.

For instance, in my system I checked in 1,200 documents and marked them to be published at the same time. I found that it took, on average, only 85 milliseconds to check in and another 126 milliseconds to publish each document.

These numbers came from the latency counters. Another latency counter told me I'd been waiting a bit longer to enumerate the document folder hierarchy using the Web folders client. As you can see in Figure 12–4, during this little test, it took, on average, more than 64,000 milliseconds to bring up the document folder list. That list has more than 5,000 folders, and it really did take longer than usual to bring up the folder list.

Cc750043.awsp1204(en-us,TechNet.10).gif

Figure 12: –4 Document management counters in Report view

When you compare these numbers against a baseline, they can be very helpful in determining whether your SPS server can handle an increase in the number of users (and stress). Building a baseline of numbers when your server is functioning well gives you a standard against which to track trends and make predictions.

Also pay attention to the number of failed DM actions. These numbers are reported on an average basis. A marked increase in numbers after you add new users to a workspace might indicate the need for additional training for those who use the DM functions. A steadily increasing set of numbers without a corresponding increase in the number of users accessing the server might indicate that other processes are becoming bottlenecks and are not allowing DM functions to finish. Be sure to monitor the processor, disk, memory, and network subsystems to ensure that your server can handle the load being placed on it by users.

Table 12–1 shows all the DM counters, as taken from System Monitor.

Table 12–1 Document management counters

Counter

Description

Failed Approves

Total number of failed approve requests

Failed Approves Latency

Average latency at which failed approve requests are processed

Failed Checkins

Total number of failed check-in requests

Failed Checkins Latency

Average latency at which failed check-in requests are processed

Failed Checkouts

Total number of failed check-out requests

Failed Checkouts Latency

Average latency at which failed check-out requests are processed

Failed Copies

Total number of failed copy requests

Failed Copies Latency

Average latency at which failed copy requests are processed

Failed Deletes

Total number of failed delete requests

Failed Deletes Latency

Average latency at which failed delete requests are processed

Failed Enum Folders

Total number of failed enumerate folders requests

Failed Enum Folders Latency

Average latency at which failed enumerate folders requests are processed

Failed Moves

Total number of failed move requests

Failed Moves Latency

Average latency at which failed move requests are processed

Failed Publishes

Total number of failed publish requests

Failed Publishes Latency

Average latency at which failed publish requests are processed

Failed Rejects

Total number of failed reject requests

Failed Rejects Latency

Average latency at which failed reject requests are processed

Failed Undo Checkouts

Total number of failed undo check-out requests

Failed Undo Checkouts Latency

Average latency at which failed undo check-out requests are processed

Failed Version Histories

Total number of failed version history requests

Failed Version Histories Latency

Average latency at which failed version history requests are processed

Successful Approves

Total number of successful approve requests

Successful Approves Latency

Average latency at which successful approve requests are processed

Successful Checkins

Total number of successful check-in requests

Successful Checkins Latency

Average latency at which successful check-in requests are processed

Successful Checkouts

Total number of successful check-out requests

Successful Checkouts Latency

Average latency at which successful check-out requests are processed

Successful Copies

Total number of successful copy requests

Successful Copies Latency

Average latency at which successful copy requests are processed

Successful Deletes

Total number of successful delete requests

Successful Deletes Latency

Average latency at which successful delete requests are processed

Successful Enum Folders

Total number of successful enumerate folders requests

Successful Enum Folders Latency

Average latency at which successful enumerate folders requests are processed

Successful Moves

Total number of successful move requests

Successful Moves Latency

Average latency at which successful move requests are processed

Successful Publishes

Total number of successful publish requests

Successful Publishes Latency

Average latency at which successful publish requests are processed

Successful Rejects

Total number of successful reject requests

Successful Rejects Latency

Average latency at which successful reject requests are processed

Successful Undo Checkouts

Total number of successful undo check-out requests

Successful Undo Checkouts Latency

Average latency at which successful undo check-out requests are processed

Successful Version Histories

Total number of successful version history requests

Successful Version Histories Latency

Average latency at which successful version history requests are processed

Monitoring Search: The Gatherer and Indexing Processes

Several objects relate to the indexing and search processes: SharePoint Portal Server Subscriptions, Microsoft Search, Microsoft Search Catalogs, Microsoft Search Indexer Catalogs, Microsoft Gatherer, and Microsoft Gatherer Projects. The Microsoft Search Indexer Catalogs object can be measured on each workspace that has been created on the SharePoint server, or it can be measured on all the workspaces as a unit. If you're interested in learning about the indexer catalogs for a specific workspace, be sure to measure only that workspace by selecting the Select Instances from List radio button and highlighting the desired workspace.

When monitoring the gatherer and indexing processes, you may want to know about specific items, such as the following:

If you need to know how many documents the gatherer crawled but did not include in the index, use the Microsoft Gatherer Projects—Status Error counter.

  • To see how fast notifications are being generated, use the Microsoft Gatherer—Notifications Rate counter.

  • To see how many crawls are in progress simultaneously, use the Microsoft Gatherer Projects—Crawls in Progress counter.

  • To find out how many documents are in the index, use the Microsoft Search Indexer Catalogs—Number of Documents counter.

A couple of counters deserve a quick mention here. You might want to place these counters in an Alert view in case you find yourself needing to know immediately when a crawl process is not working.

Robots.txt

Whenever the gatherer attempts to crawl a Web site, it first looks for a file on the site called

robots.txt

. An example is found at www.archive.org, a Web site dedicated to archiving the Internet. In some searches on that site, you'll find references to further crawling being blocked by the site's FakePre-2e1146185f734759a726828d14dd8786-9b90c041775e4b34a36558908ed7b968file. Other examples are the major search engines, such as www.altavista.com, www.yahoo.com, and www.google.com. You can monitor how many FakePre-142b8bc12e294c2498e334a22049ab83-c6394ae15f544afba96e62e8699aaf1dfiles have been accessed by the gatherer process by monitoring the Microsoft

Gatherer—Access robots.txt File counter.

The

robots.txt 

file lists the portions of the Web site that are restricted and specifies whether there are any restricted crawlers. The FakePre-b45c561586b944c4ab5759c5f01f1278-4170cbc4616841a4a6a8a50eb08987f2file should be found in the directory that is considered the home directory of the site. By default, Internet Information Services does not install this file, nor does SPS. However, you can create your own file if needed by using Notepad. After the file is created, SPS will refer to it once each 24-hour period, based on the last time the Search service was started. If you make changes to the file and want those changes effective immediately, you must restart the Search services.

If you like, you can have a

robot.txt 

file generated for you automatically at various Web sites, such as http://www.rietta.com/ FakePre-c71efb1e508d45dca295d0ec23f9365a-e884b483214747d6a6d093ea96fc8382.

Sample text entered into a

robot.txt 

file might look like this: FakePre-270f9775c7764028a01703c80ed915e2-f5252f79d24e4b36a14c3af6b989a4b5

If you like, you can use

FOLLOW

/FakePre-0995f59ea68d44279ebecb8912951b17-5f047cb4603e478cae88c67d0de281d2and FakePre-4c6be41ed8ec44279e70a9a617444e2d-9aa21b466ec44e4ab2b80e26e628c2f0/FakePre-d5af743c73464aa0a0cb66b12b568eb8-0a36826354424e3287cb1072bcc3b3c7in the HTML meta tags of an HTML document. For example, you can mark a document with the following:

<META name=robotname content="NOINDEX/NOFOLLOW">

This type of meta tag instructs the robot not to index the document and not to follow any of the links on the page.

For more information, see 217103 in the Microsoft Knowledge Base.

First, the Delayed Documents counter indicates the number of documents that are waiting to be crawled based on the site hit frequency rules. If you have a plethora of rules and this number is steadily increasing over time, consider relaxing or simplifying your site hit frequency rules. A very high number may indicate a conflict in the rules that the gatherer cannot resolve or follow with efficiency.

Second, the Documents Filtered Rate counter indicates the number of documents that were filtered on a per-second basis and is expressed as an average

(Figure 12–5). If this rate is decreasing over time, you should perform some troubleshooting to find out why your server is not filtering documents as quickly as it did in the past. Look for memory issues, processor issues, network issues, or site hit frequency rules that slow the gatherer process.

The Documents Delayed Retry counter indicates whether the target server's Web Storage System is shut down. If the value for this counter is greater than 0, you can assume that the WSS is shut down and that the crawl should be performed later.

The Filter Process Created counter indicates the number of times a filter process was either created or started. A very high number can indicate a problem in crawling one or more content sources. If this number is high, troubleshoot your IFilters and perhaps your protocol handlers.

Cc750043.awsp1205(en-us,TechNet.10).gif

Figure 12: –5 Gatherer counters displayed in the Report view

Sometimes, the gatherer process will back off, and the reasons for this action are expressed in a numeric value. The values correspond to the following meanings:

  • 0: The gatherer service is up and running.

  • 1: There is high system I/O traffic on the target server.

  • 2: There is a high notifications rate.

  • 3: Delayed recovery is in progress.

  • 4: Back-off is due to a user-initiated command.

  • 5: There exists a low-battery situation.

  • 6: Memory is low.

  • 99: Undefined reason generated by the search process.

During a back-off period, indexing is suspended. To manually back off the gatherer service, pause the search service. If the search service itself generates the back-off, an event will be recorded and the search service will be paused automatically. There is no automatic restart, so you must manually start the search service in order to end a back-off state. Note that there is little reason to start the search service until you've solved the problem that caused the back-off in the first place.

The counters beneath the Microsoft Gatherer Projects object focus on crawling documents that exist in a file system or Web site. Be sure to pay close attention to one or more of these counters. I discuss some of the more important ones here.

The Adaptive Crawl False Positives counter indicates the number of times the adaptive update has predicted that a document has changed when it has not.

The Retries counter indicates the number of times that access to a document has been retried. A high number means that the gatherer is attempting to access a document numerous times, without success. You should check the gatherer logs and identify the problem document. Then ensure that it has the correct extension and that you have the correct IFilter for it.

The Threads Accessing Network counter measures the number of threads that are waiting for a response from the filter process. If you don't see any activity, this counter equals the number of filtering threads; this number may indicate a network problem or unavailability of the server being crawled.

These are only some of the counters associated with the search and indexing functions. I focus on the gatherer counters because I believe that if you experience a problem indexing a document, most likely it will fail during the gatherer phase. Best practice, when monitoring your server, is to monitor all the counters for a given object so that you can perform a complete analysis of the server after the data is gathered. Remember that with Performance Monitor, you don't need to store the log file locally; it can be created and built on a remote server. Please consult the Windows 2000 Resource Kit for more information about this.

When you're working with the SharePoint Portal Server subscriptions counters, there are two that you might want to monitor if users are complaining that they are not being notified in a timely manner of document changes.

The Total Hits Received and Total Notifications Sent counters will help you to understand how many notifications are being generated and how many should have been generated. For instance, if you have a high number of hits but a low number of notifications sent, it means that you have a problem with one or more of your plug-ins. However, if the number of hits and the number of noti- fications sent are close to the same, you can be assured that your SharePoint server is sending notifications for the hits that it has received.

Another possibility is that your Exchange 2000 Server (I'm making an assumption here) cannot handle all the notification e-mails. In other words, SPS is generating the notifications, but there is a bottleneck in your mail delivery system that is either stopping or hindering e-mails to users. Be sure to look at this part of the e-mail delivery system, too.

Table 12–2 shows all the subscriptions counters.

Table 12–2 SPS subscriptions counters

Counter

Description

Errors Access Denied

Total number of access-denied errors received during access check

Total Access Checks

Total number of access checks that the subscriptions engine does

Total Discarded Hits

Total number of hits that the subscriptions engine discards

Total Documents Processed

Total number of documents that the subscriptions engine processes

Total Documents Processed/sec.

Rate at which the subscriptions engine processes documents

Total Duplicate Hits

Total number of duplicate hits that the subscriptions engine processes

Total Full Access Checks

Total number of access checks done by contacting the domain controller

Total Hits Received

Total number of hits that the subscriptions engine processes

Total Hits Received/sec.

Rate at which the subscriptions engine receives hits

Total Notifications Sent

Total number of e-mail subscription notifications that the system sends

Total Notifications Sent/sec.

Rate at which the subscriptions engine sends e-mail notifications

Total Subscriptions

Total number of subscriptions defined in the system

Now let's look at the search counters. You may want to monitor three counters in tandem: Current Connections, Failed Query Rate, and Succeeded Query Rate. Why? One reason is to see whether your user training has done any good. For instance, you can monitor these counters before training commences and after it has terminated. In theory, you should see a rise in the Current Connections counter because after training, people will be more likely to use this service. In addition, you should see a drop in the Failed Query Rate because training should teach them how to use the Search Web part more effectively. And finally, you should see a rise in the Succeeded Query Rate for the same reason: After training, people should understand how to use the Web part and should be more successful at finding documents.

If your training was effective, the Failed Query Rate should decline even though the Current Connections rate is on the rise.

A second way to use these counters is to monitor the Query Rate counter.

If this counter is increasing over time, you may need to set a benchmark that you and your manager agree on that will indicate when it's time to dedicate a server to search queries. Also take into account other measurements—on the disk, processor, and memory subsystems—but be sure to include this rate when considering this question.

Table 12–3 shows the search counters.

Table 12–3 SPS search counters

Counter

Description

Active Threads

Total number of threads currently servicing queries

Current Connections

Number of currently established connections between MSSearch and all clients

Failed Queries

Number of queries that fail

Failed Query Rate

Number of failed queries per second

Queries

Cumulative number of queries posted to the server

Query Rate

Number of queries posted to the server per second

Result Rate

Number of results returned to the client per second

Results

Cumulative number of results returned to clients

Succeeded Queries

Number of queries that produce successful searches

Succeeded Query Rate

Number of queries per second that produce successful searches

Threads

Total number of threads available for servicing queries

Table 12–4 shows the search catalogs counters. One counter to watch here is the Catalog Size counter. Be sure you've placed your catalogs on a disk that can hold them going forward. Also, if you monitor this counter over time, you can get a sense of how long you have before the disk will run out of free disk space.

Table 12–4 SPS search catalogs counters

Counter

Description

Catalog Size (MB)

Size of catalog data in megabytes

Failed Queries

Number of queries that fail

Failed Queries Rate

Number of failed queries per second

Number Of Documents

Total number of documents in the catalog

Persistent Indexes

Number of persistent indexes

Queries

Cumulative number of queries posted to the catalog

Queries Rate

Number of queries posted to the catalog per second

Results

Cumulative number of results returned to clients

Results Rate

Number of results returned to the client per second

Successful Queries

Number of queries that produce successful searches

Successful Queries Rate

Number of queries per second that produce successful searches

Unique Keys

Number of unique words and properties in the catalog

Use the search indexer catalogs counters shown in Table 12–5 to help you set benchmarks as to when you should initiate a full indexing procedure to clear out shadow indexes and any newly created word lists. You might also use the Files To Be Filtered counter to track trends over time to see whether your content sources are growing to the point where a dedicated indexing server is necessary. Most likely, this would be indicated by the indexing process taking more and more time. If this is the case, you should see a commensurate rise in the Active Documents, Index Size, Number Of Documents, and Unique Keys counters. If all these numbers are steadily increasing, you may reach the point where a dedicated indexing server would be a good idea. Discuss this with your manager, and set benchmarks where appropriate.

Table 12–5 SPS search indexer catalogs counters

Counter

Description

Active Documents

Number of documents currently active in content index

Build In Progress

Indicator that an index build is in progress

Documents Filtered

Number of documents filtered since the catalog was mounted

Documents In Progress

Number of documents for which data is being added

Files To Be Filtered

Number of files waiting to be filtered and added to the catalog

Index Size (MB)

Current size of index data in megabytes

Merge Progress

Percentage of merge complete for the current merge

Number Of Documents

Number of documents in the catalog

Number Of Propagations

Number of propagations in progress

Persistent Indexes

Number of persistent indexes

Unique Keys

Number of unique words and properties in the catalog

Wordlists

Total number of word lists

In the gatherer counters (Table 12–6), you might want to pay attention to the Documents Delayed Retry counter. If this counter is substantially higher than your baseline and you've just added a content source, you should check the gatherer logs on the content source to identify the errors. This number, if anything, should not be rising if each content source is configured correctly.

Table 12–6 SPS gatherer counters

Counter

Description

Accessing Robots.txt File

Number of current requests for

robots.txt
, which is requested by the system implicitly, for every host, through HTTP

Active Queue Length

Number of documents waiting for robot threads; if not 0, all threads should be filtering

Admin Clients

Number of currently connected administrative clients

All Notifications Received

Total number of notifications received from all notification sources, including file system

Delayed Documents

Number of documents delayed due to site hit frequency rules

Document Entries

Number of document entries currently in memory

Documents Delayed Retry

Number of documents that are retried after time-out

Documents Filtered

Number of times a filter object was created; corresponds to the total number of documents filtered in the system since startup

Documents Filtered Rate

Number of documents filtered per second

Documents Successfully Filtered

Number of documents successfully filtered

Documents Successfully Filtered Rate

Number of documents successfully filtered per second

Ext. Notifications

Rate External notifications received per second

Ext. Notifications Received

Total number of notifications received from all notification sources, excluding file system

Filter Objects

Number of filter objects (each corresponding to a URL) currently being filtered in the system

Filter Process Created

Total number of times a filter process was created or restarted

Filter Processes

Number of filtering processes in the system

Filter Processes Max

Maximum number of filtering processes that have existed in the system since startup

Filtering Threads

Total number of filtering threads in the system

Heartbeats

Total number of heartbeats counted since startup; a heartbeat occurs once every 10 seconds while the service is running; if the service is not running, there is no heartbeat and the number of ticks is not incremented

Heartbeats Rate

One heartbeat displayed every 10 seconds

Idle Threads

Number of threads waiting for documents

Notification Sources

Currently connected external notification sources

Notifications Rate

Notifications received per second

Performance Level

Level of the amount of system resources that the gatherer service is allowed to use

Reason to back off

Code describing why the gatherer service went into back-off state

Robots.Txt Requests

Total number of requests for

robots.txt

Server Objects

Number of servers that the system recently accessed

Server Objects Created

Number of times a new server object needed to be created

Servers Currently Unavailable

The servers unavailable because a number of requests to that server are timed out

Servers Unavailable

The servers unavailable because a number of requests to that server are timed out

Stemmers Cached

Number of available cached stemmer instances

System I/O Traffic Rate

SystemI/O (disk) traffic rate in kilobytes per second (KBps) detected by back-off logic

Threads Accessing Network

Number of threads waiting for a response from the filter process

Threads blocked due to back off

Number of threads blocked due to back-off event

Threads In Plug-ins

Number of threads waiting for plug-ins to complete an operation

Time-Outs

Total number of time-outs that the system has detected since startup

Wordbreakers Cached

Number of available cached instances of wordbreakers

One way to know when the crawling has stopped is to use the "in progress" counters outlined in Table 12–7. When their numbers return to 0, you can be assured that the crawling process has ended. If you need to know when the crawling process has ended in real time, consider using the alert monitoring method to generate a notification about this. To learn how to create and use alerts in System Monitor, please consult the Windows 2000 Server Resource Kit.

Table 12–7 SPS gatherer projects counters

Counter

Description

Accessed File Rate

Number of documents accessed through the file system per second

Accessed Files

Number of documents accessed through the file system

Accessed HTTP

Number of documents accessed through HTTP

Accessed HTTP Rate

Number of documents accessed through HTTP per second

Adaptive Crawl Accepts

Documents accepted by adaptive crawl

Adaptive Crawl Error Samples

Documents accessed for error sampling

Adaptive Crawl Errors

Documents incorrectly rejected by adaptive crawl

Adaptive Crawl Excludes

Documents excluded by adaptive crawl

Adaptive Crawl False Positives

Documents incorrectly accepted by adaptive crawl

Adaptive Crawl Total

Documents to which adaptive update logic was applied

Changed Documents

Documents that have changed since the last crawl

Crawls In Progress

Number of crawls in progress

Delayed Documents

Number of documents delayed due to site hit frequency rules

Document Add Rate

Number of document additions per second

Document Additions

Number of add notifications

Document Delete Rate

Number of document deletions per second

Document Deletes

Number of delete notifications

Document Modifies

Number of modify notifications

Document Modifies Rate

Number of modify notifications per second

Document Move and Rename Rate

Number of document moves and renames per second

Document Moves/Renames

Number of notifications of document moves and renames

Documents In Progress

Number of documents in progress

Documents On Hold

Number of documents on hold because a document with the same URL is currently being processed

Error Rate

Number of filtered documents that returned an error per second

File Errors

Number of file protocol errors received while getting documents

File Errors Rate

Number of file protocol errors received per second

Filtered HTML

Number of HTML documents filtered

Filtered HTML Rate

Number of HTML documents filtered per second

Filtered Office

Number of Office documents filtered

Filtered Office Rate

Number of Office documents filtered per second

Filtered Text

Number of text documents filtered

Filtered Text Rate

Number of text documents filtered per second

Filtering Documents

Number of documents currently being filtered

Gatherer Paused Flag

Indicator that the gatherer has been paused

History Recovery Process

Percentage of the history recovery completed

HTTP Errors

Number of HTTP errors received

HTTP Errors Rate

Number of HTTP errors received per second

Incremental Crawls

Number of incremental crawls in progress

Iterating History In Progress Flag

Indicator of whether the gatherer is currently iterating over the URL history

Not Modified

Number of documents that were not filtered because no modification was detected since the last crawl

Processed Documents

Number of documents processed since the history was reset

Processed Documents Rate

Number of documents processed per second

Recovery In Progress Flag

Indicator that recovery is currently in progress; indexing is not resumed until this flag is off

Retries

Total number of times access to a document has been retried; high number may indicate a problem with accessing the data

Retries Rate

Number of retries per second

Started Documents

Number of documents initiated into the gatherer service, including the number of documents on hold, in the active queue, and currently filtered; when this number goes to 0 during a crawl, the crawl will be completed soon

Status Error

Number of filtered documents that returned an error

Status Success

Number of successfully filtered documents

Success Rate

Number of successfully filtered documents per second

Unique Documents

Number of unique documents in the system; documents are considered not unique if their content is the same

URLs in History

Number of files (URLs) in the history list, indicating the total number of URLs covered by the crawl, either successfully indexed or failed

Waiting Documents

Number of documents waiting to be processed; when this number goes to 0, the catalog is idle; indicates the total queue size of unprocessed documents in the gatherer

Monitoring the Web Storage System

You may want to monitor the Web Storage System. Because the WSS is the foundation database for SPS, it stands to reason that you'll want to pay attention to this database. The object that you'll want to monitor—the MSExchange OLEDB resource object—contains counters for monitoring the number and rate of transactions that are committed to the WSS.

Because control is returned to the user only when a transaction has been committed, you should pay attention to two counters: Transactions Started Rate and Transactions Committed Rate. Both measurements occur once per second.

Hence, if 40 transactions were starting their commitment during a given second, most if not all should be committed during the ensuing few seconds.

Make sure that the started rate and the committed rate are roughly equal. If the transactions started rate greatly exceeds the transactions committed rate, it means that you are having problems with the creation of your transaction logs or the commitment of data to the logs after they are written. Any of the following causes, or a combination, may be to blame:

  • Slow hard drive

  • Not enough memory

  • System can't create log files fast enough

  • Slow processor

  • Poorly configured Web server (see Chapter 4 on optimizing IIS services)

Table 12–8 shows the MSExchange OLEDB resource counters, and Table

12–9 shows the MSExchange OLEDB events counters.

Table 12–8 MSExchange OLEDB resource counters

Counter

Description

Active Commands

Number of

Command 
objects that are currently active

Active DataSources

Number of

DataSource 
objects that are currently active

Active Rows

Number of

Row 
objects that are currently active

Active Rowsets

Number of

Rowset 
objects that are currently active

Active Sessions

Number of

Session 
objects that are currently active

Active Streams

Number of

Stream 
objects that are currently active

Resource Bindings Rate

Number of successful resource bindings per second

Resource Bindings Total

Total number of successful resource bindings

Rowsets Opened Rate

Number of times that rowsets are opened per second

Rowsets Opened Total

Total number of times that rowsets have been opened

Transactions Aborted Rate

Number of transactions aborted successfully per second

Transactions Aborted Total

Total number of transactions that have been successfully aborted

Transactions Committed Rate

Number of transactions committed successfully per second

Transactions Committed Total

Total number of transactions that have been successfully committed

Transactions Started Rate

Number of transactions started per second

Transactions Started Total

Total number of transactions that have been started

Table 12–9 MSExchange OLEDB events counters

Counter

Description

Events Completion Rate

Number of events completed per second

Events Completion Total

Total number of events that have been completed

Events Submission Rate

Number of events submitted per second

Events Submission Total

Total number of events that have been submitted

Internet Information Services

There are three main counters to use when monitoring IIS. First is the Internet Information Services Global object. This object contains the counters that report on bandwidth throttling and the object cache, a cache in memory shared by the IIS services. Bandwidth throttling is a technique used to keep IIS from using more bandwidth than is specified by the IIS administrator. If the bandwidth used by the IIS services approaches or exceeds this limit, bandwidth throttling delays or rejects IIS service requests until more bandwidth becomes available.

Note: The object cache retains in memory frequently used objects. Repeated retrieval of the same objects could slow IIS considerably, so these objects are cached after they are retrieved for the first time. The object cache counters provide insight into the size and content of the IIS object cache as well as its effectiveness, such as cache hits and misses.

The Web service object counters show data about the anonymous and authenticated connections to IIS. This object focuses on the HTTP protocol.

It also monitors calls to Common Gateway Interface (CGI) applications and Internet Server Application Programming Interface (ISAPI) extensions. The Active Server Pages object provides counters for monitoring applications running on your Web server that use Active Server Pages.

Tables 12–10, 12–11, and 12–12 show the counters and their explanations for these three objects.

Table 12–10 Active Server Pages counters

Counter

Description

Debugging Requests

Number of debugging document requests Errors During Script

Runtime

Number of requests failed due to runtime errors

Errors from ASP Preprocessor

Number of requests failed due to preprocessor errors

Errors from Script Compilers

Number of requests failed due to script compilation errors

Errors/sec

The number of errors per second

Request Bytes In Total

The total size, in bytes, of all requests

Request Bytes Out Total

The total size, in bytes, of responses sent to clients, not including standard HTTP response headers

Request Execution Time

The number of milliseconds that it took to execute the most recent request

Request Wait Time

The number of milliseconds the most recent request was waiting in the queue

Requests Disconnected

The number of requests that were disconnected due to communication failure

Requests Executing

The number of requests currently executing

Requests Failed Total

The total number of requests failed due to errors, authorization failure, and rejections

Requests Not Authorized

Number of requests failed due to insufficient access rights

Requests Not Found

The number of requests for files that were not found

Requests Queued

The number of requests waiting for service from the queue

Requests Rejected

The total number of requests not executed because there were insufficient resources to process them

Requests Succeeded

The number of requests that executed successfully

Requests Timed Out

The number of requests that timed out

Requests Total

The total number of requests since the service was started

Requests/sec

The number of requests executed per second

Script Engines Cached

The number of script engines in cache

Session Duration

The number of milliseconds that the most recent sessions persisted

Sessions Current

The current number of sessions being serviced

Sessions Timed Out

The number of sessions timed out

Sessions Total

The total number of sessions since the service was started

Template Cache Hit Rate

Percent of requests found in template cache

Template Notifications

The number of templates invalidated in the cache due to change notification

Templates Cached

The number of templates currently cached

Transactions Aborted

The number of transactions aborted

Transactions Committed

The number of transactions committed

Transactions Pending

The number of transactions in progress

Transactions Total

The number of transactions since the service was started

Transactions/sec

Transactions started per second

Table 12–11 Internet Information Services global object

Counter

Description

Active Flushed Entries

Cached file handles that will be closed when all current transfers complete

BLOB Cache Flushes

Binary large object (BLOB) cache flushes since server startup

BLOB Cache Hits

Total number of successful lookups in the BLOB cache

BLOB Cache Hits %

The ratio of BLOB cache hits to total cache requests

BLOB Cache Misses

Total number of unsuccessful lookups in the BLOB cache

Current BLOBs Cached

BLOB information blocks currently in the cache for WWW and FTP services

Current Blocked Async I/O Requests

Current requests temporarily blocked due to bandwidth throttling settings

Current File Cache Memory Usage

Current number of bytes used for file cache

Current Files Cached

Current number of files whose content is in the cache for WWW and FTP services

Current URLs Cached

URL information blocks currently in the cache for WWW and FTP services

File Cache Flushes

File cache flushes since server startup

File Cache Hits

Total number of successful lookups in the file cache

File Cache Hits %

The ratio of file cache hits to total cache requests

File Cache Misses

Total number of unsuccessful lookups in the file cache

Maximum File Cache Memory Usage

Maximum number of bytes used for file cache

Measured Async I/O Bandwidth Usage

Measured bandwidth of asynchronous I/O averaged over one minute

Total Allowed Async I/O Requests

Total requests allowed by bandwidth throttling settings (counted since service startup)

Total BLOBs Cached

Total number of BLOB information blocks ever added to the cache for WWW and FTP services

Total Blocked Async I/O Requests

Total requests temporarily blocked due to bandwidth throttling settings (counted since server startup)

Total Files Cached

Total number of files whose content was ever added to the cache for WWW and FTP services

Total Flushed BLOBs

The number of BLOB information blocks that have been removed from the cache since service startup

Total Flushed Files

The number of file handles that have been removed from the cache since service startup

Total Flushed URLs

The number of URL information blocks that have been removed from the cache since service startup

Total Rejected Async I/O Requests

Total requests rejected due to bandwidth throttling settings (counted since service startup)

Total URLs Cached

Total number of URL information blocks ever added to the cache for WWW and FTP services

URL Cache Flushes

URL cache flushes since service startup

URL Cache Hits

Total number of successful lookups in the URL cache

URL Cache Hits %

The ratio of URL cache hits to total cache requests

URL Cache Misses

Total number of unsuccessful lookups in the URL cache

Table 12–12 Web service counters

Counter

Description

Anonymous Users/sec

The rate users are making anonymous connections using the Web service

Bytes Received/sec

The rate that data bytes are received by the Web service

Bytes Sent/sec

The rate that data bytes are sent by the Web service

Bytes Total/sec

The sum of Bytes Sent/Sec and Bytes Received/Sec; the total rate of bytes transferred by the Web service

CGI Requests/sec

The rate at which CGI requests are simultaneously being processed by the Web service

Connection Attempts/sec

The rate at which connections using the Web service are being attempted

Copy Requests/sec

The rate at which HTTP requests using the

COPY 
method (used for copying files and directories) are being made

Current Anonymous Users

The number of users who currently have an anonymous connection to the Web service

Current Blocked Async I/O Requests

Current requests temporarily blocked due to bandwidth throttling settings

Current CAL Count for Authentication

The current count of licenses used simultaneously by the Web service for authenticated users

Current CAL Count of SSL Connection

The current count of licenses used simultaneously by the Web service for SSL connections

Current CGI Requests

The current number of CGI requests that are simultaneously being processed by the Web service

Current Connections

The current number of connections established with the Web service

Current ISAPI Extension Requests

The current number of extension requests that are simultaneously being processed by the Web service

Current NonAnonymous Users

The number of users who currently have a non-anonymous connection using the Web service

Delete Requests/sec

The rate at which HTTP requests using the

DELETE 
method (generally used for file removals) are made

Files Received/sec

Rate at which files are received by the Web service

Files Sent/sec

Rate at which files are sent by the Web service

Files/sec

Rate at which files are transferred, both sending and receiving, by the Web service

Get Requests/sec

The rate at which HTTP requests using the

GET 
method (generally used for basic file retrievals or image maps, although can be used with forms) are made

Head Requests/sec

The rate at which HTTP requests using the

HEAD 
method (generally indicates that a client is querying the state of an in-use document to see whether it needs to be refreshed) are made

ISAPI Extension Requests/sec

Rate at which ISAPI extension requests are simultaneously being processed by the Web service

Lock Requests/sec

The rate at which HTTP requests using the

LOCK 
method are made

Locked Errors/sec

The rate of errors due to requests that couldn't be satisfied by the server because the requested document was locked; generally reported as an HTTP 423 error code to the client

Logon Attempts/sec

The rate at which logons using the Web service are being attempted

Maximum Anonymous Users

The number of users who established concurrent anonymous connections using the Web service since service startup

Maximum CAL Count for Authenticated Users

The maximum count of licenses used simultaneously by the Web service for authenticated connections

Maximum CAL Count for SSL Connections

The maximum count of licenses used simultaneously by the Web service for SSL connections

Maximum CGI Requests

The maximum number of CGI requests simultaneously processed by the Web service

Maximum Connections

The maximum number of simultaneous connections established with the Web service

Maximum ISAPI Extension Requests

Maximum number of extension requests simultaneously processed by the Web service

Maximum NonAnonymous Users

Maximum number of users who established concurrent non-anonymous connections using the Web service since service startup

Measured Async I/O Bandwidth Usage

Measured bandwidth of asynchronous I/O averaged over one minute

Mkcol Requests/sec

The rate at which HTTP requests using the

MKCOL 
method (used to create directories on the server) are made

Move Requests/sec

The rate at which HTTP requests using the

MOVE 
method (used for moving files and directories) are made

NonAnonymous Users/sec

The rate at which users are making non-anonymous connections using the Web service

Not Found Errors/sec

The rate of errors due to requests that couldn't be satisfied by the server because the requested document could not be found; generally reported as an HTTP 404 error code to the client

Options Requests/sec

The rate at which HTTP requests using the

OPTIONS 
method are made

Other Requests Methods/sec

The rate at which HTTP requests are made that do not use the

OPTIONS
, FakePre-dbb5b45acf174f70999a6036120b6c54-1effac2d7a1c4b6eaafc92bbb9075696, FakePre-6fc2e3a6c5ea4313b0b3b28966f60661-15e73d8bfdcb4e90a8069829389f1954, FakePre-fdf759c8bb584aa69a518eac69885c4b-79674be6f6e44b4db10c8d0d3ba6b1df, FakePre-697deab2bf504431b4c414377b0d7d4b-d90f8e5e8df2415281df7e9f442792f9, FakePre-49bff8b898264956a677d33d30521312-fbc9a7c01226446f80adce3bb544bcb2, FakePre-f75578e61641468f931749fb1414d732-daff113b52bb44b19786fdfc42f14f3b, FakePre-800df569886747598919666397047368-cfa37c87919e42fdbde7a889e10bf628, FakePre-fe750bab1dd646ceb2889eca40b88924-e1649429dd1247cb92a34f76e149030b, FakePre-fb6356aaeedb4670b95f99c7272fb878-b9a6c77ce9e64742a433407b74d63c68, FakePre-5b9495948b47418eb8c2785c1b775d27-aaea3690864f41ffb16ceeea4394319f, FakePre-c58872ff20624cc68cbcc98265bf496c-1d9e999610764bc2ba5fec5f325c6b71, FakePre-67be699e79a944d8b0ce05b8d94dadd7-aa1eebb6e3e94a2cb6b21ea618b03d42, FakePre-7db76c99b3a041c6a8d546464125100c-248616726191422eae320188383a234b, or FakePre-bcd261f6e8b442b98511120959998d13-29cb3a7017f14b26b0f9b4bd60f341b6method

Post Requests/sec

The rate at which HTTP requests using the

POST 
method (used for forms and gateway requests) are made

Propfind Requests/sec

The rate at which HTTP requests using the

PROPFIND 
method (used to retrieve property values on files and directories) are made

Proppatch Requests/sec

The rate at which HTTP requests using the

PROPPATCH 
method (used to set property values on files and directories) are made

Put Requests/sec

The rate at which HTTP requests using the

PUT 
method are made

Search Requests/sec

The rate at which HTTP requests using the

MS-SEARCH 
method (used to query the server to find resources that match a set of conditions provided by the client) are made

Service Uptime

Uptime for W3CSVC Service or W3 sites

Total Allows Async I/O Requests

Total requests allowed by bandwidth throttling settings since service startup

Total Anonymous Users

Number of users who established an anonymous connection with the Web service since startup

Total Blocked Async I/O Requests

Total requests temporarily blocked due to bandwidth throttling settings since service startup

Total CGI Requests

Requests (since service startup) for custom gateway executables (

.exe
) that the administrator can install to add forms processing or other dynamic data sources; CGI requests spawn a process on the server, which can be a large drain on server resources

Total Connection Attempts (All Instances)

Number of connections that have been attempted using the Web service since startup; this counter is for all instances

Total Copy Requests

Number of HTTP requests using the

COPY 
method (used for copying files and directories) since service startup

Total Count of Failed CAL Requests

Number of HTTP requests (total since startup) that failed due to a license being unavailable for an authenticated user

Total Count of Failed CAL Requests SSL

Number of HTTP requests (total since startup) that failed due to a license being unavailable for an authenticated users over SSL

Total Delete Requests

Number of HTTP requests using the

DELETE 
method (generally used for file removals) since service startup

Total Files Received

Total number of files received by the Web service since service startup

Total Files Sent

Total number of files sent by the Web service since startup

Total Files Transferred

Total number of files sent and received by the Web service since service startup

Total Get Requests

The rate at which HTTP requests using the

GET 
method (generally used for basic file retrievals or image maps, although they can be used with forms) are made since service startup

Total Head Requests

Total number of HTTP requests using the

HEAD 
method since service startup

Total ISAPI Extension Requests

Total number of ISAPI extension requests since service startup

Total Lock Requests

Total number of

LOCK 
requests since service startup

Total Locked Errors

Total number of requests for resources that were locked at the time of the request since service startup

Total Logon Attempts

Total number of logon attempts using the Web service since startup

Total Method Requests

Total number of all HTTP requests since service startup

Total Method Requests/sec

The rate at which all HTTP requests are made

Total Mkcol Requests

Total number of all

MKCOL 
requests since service startup

Total Move Requests

Total number of all

MOVE 
requests since service startup

Total NonAnonymous Users

Total number of all users who established a non-anonymous connection with the Web service since service startup

Total Not Found Errors

Total number of all requests (since service startup) that couldn't be satisfied because the server couldn't find the requested document; generally results in an HTTP 404 error code to the client

Total Options Requested

Total number of HTTP requests using the

OPTIONS 
method since startup

Total Other Request Methods

Total number of HTTP requests that are not

OPTIONS
, FakePre-cf2fb8e741b64c27b863b992ab818e20-394a2fb4cf5f44a790206917765db6f7,FakePre-3ee469d9898f46bb83a12d9d958de99e-108eb6f290f546ac8482228428d27473, FakePre-c1670d3553e043cdb828477b53b021af-c6035857a232425daa9fe791530956f3, FakePre-e6492a15864940138ba8c9bf16801f00-76166880981744fc8add38642f3344a2, FakePre-e580e8f1ff934bfe813a48707a1337bb-8642d9ce4bb44408be2d9dc082719f8f, FakePre-244f56fac711466081171facd6a490a9-cdffb9567fb84867a61edc5753257152, FakePre-68de0b50e003479bbca0764a5283d731-0ce93e7bd367427aa8c15afca268d778, FakePre-1104fc52c8d147de881a8e4257c9ef5d-7de671cd751748c5916d8f0f5a44880c, FakePre-00a3dd49cd554e099ffb3f0a6906e37b-0af52c8fdaec432996df4756531a5b07,FakePre-387de6fff4584770a9964a56fd26d0e2-c316ccf3f5274b598c32e421f4cff8f7, FakePre-74c62b53d1bd4630bef33a0a7283f89a-57a0ea6fd8034fa685c1b5a096879ad2, FakePre-250e687116da4ada9350931fcc76dd50-48c27985e8e14261939b332263b7aada, FakePre-3a33a575431844f89a276e70083b8aad-ca611800577044fcba765b761d1d7c18, or FakePre-a72d95de0092446a857f9e582069d403-626bed0cb76c4e03ae08c37df616c7c9methods since startup

Total Post Requests

Total number of HTTP requests using the

POST 
method since startup

Total Propfind Requests

Total number of HTTP requests using the

PROPFIND 
method since startup

Total Proppatch Requests

Total number of HTTP requests using the

PROPPATCH 
method since startup

Total Put Requests

Total number of HTTP requests using the

PUT 
method since startup

Total Rejected Async I/O Requests

Total requests rejected due to bandwidth throttling settings since service startup

Total Search Requests

Total number of HTTP requests using the

SEARCH 
method since startup

Total Trace Requests

Total number of HTTP requests using the

TRACE 
method since startup

Total Unlock Requests

Total number of HTTP requests using the

UNLOCK 
method since startup

Trace Requests/sec

Rate of HTTP requests using the

TRACE 
method (allows the client to see what is being received at the end of the request chain and use the information for diagnostic purposes)

Unlock Requests/sec

Rate of HTTP requests using the

UNLOCK 
method (used to remove locks from files)

In Table 12–12, you should be using the Current CAL Count for Authentication counter to ensure that you have sufficient licenses purchased for your SPS server. Because this counter measures current, simultaneous connections

—and not the total number of connections since service startup—it stands to reason that you need to purchase only the number of client CALs that is reflected in the highest number from this counter. Notice also that there is no counter that measures all the connections, since startup, that require a CAL, because the same user might connect many times and drive up this number artificially high. I highly recommend that you monitor this counter regularly to ensure that you have purchased enough licenses to be considered in compliance. Compare this number with the Current NonAnonymous Users counter, which measures the total number of non-anonymous users simultaneously using the Web service, a number that includes both WAN and LAN authenticated users. These two counters should be close.

Getting All Stressed Out with SharePoint Portal Server

The Microsoft Web Application Stress (WAS) tool, a free application available at http://www.microsoft.com/technet/archive/itsolutions/intranet/downloads/webstres.mspx, lets you create or record a script that can be used to put stress on a SharePoint Portal Server by reproducing multiple Web requests to a single Web server. By realistically simulating multiple browser connections, you can gather performance and stability information about your SharePoint server. This information is invaluable in planning.

For More Info To learn more about capacity planning issues for SharePoint Portal Server, please see Chapter 4.

When you download the WAS tool from Microsoft's Web site, it is a single file named

setup.exe

. Copy this file to each client machine from which you want to run the test, and then double-click it to start the installation. Take the defaults in the setup wizard. Because you are stress-testing your SPS server, do not install WAS on your SharePoint server. Running this application on your SharePoint server against its own Web pages would skew the results because your SharePoint server would be busy executing the application as well as answering calls from the other client machines.

A sample script, creatively named

sample script

, is installed with WAS. You can use this script to acquaint yourself with the features of WAS.

To start WAS, click the menu selection under the Programs menu. The WAS program will start. In the left-hand window is a display of all the scripts stored in the current version of WAS. Out of the gate, you'll find the sample script and nothing else.

If you highlight the sample script, in the right pane you'll see a place to type the server's name, a description, and then the commands to be run along with the exact pages against which to run them. The WAS utility has a set of sample pages that you can copy to your default root location on the Web server. As a start, you can run this utility against the sample pages to get a sense of how it runs.

Each script item is built from an HTTP or HTTP DAV command. If you double-click a particular command, you'll be able to edit the query string name-value pairs, change POST data, modify the header, and enable Secure Sockets Layer.

This script also allows you to create the desired number of users and the number of concurrent threads your workstation will use to execute the threads. You also select the Performance Monitor counters to monitor. These can be monitored on any client running WAS, but you should monitor only on one because multiple monitoring places unnecessary stress on your SharePoint server. If you are going to run this script from one workstation, you must understand that the total number of users being simulated equals the number of threads multiplied by the stress multiplier. So if you put in 75 threads and use a multiplier of 3, you're simulating 225 concurrent connections to your Web server.

In the report, you'll see the Time To First Byte (TTFB), which calculates the time from the request for the page until WAS receives the first byte of data, in milliseconds. The Time To Last Byte (TTLB) calculates the total time from the request until the last byte of data has been received on the client, in milliseconds. The results are then divided into percentiles for further evaluation. The average should be around 50 percent.

When using WAS, remember the following:

  • Try to keep your thread usage between 10 and 100 threads. Also, be sure to monitor the processor utilization on the clients. Ensure that it is sustained at less than 80 percent; otherwise, the test will be invalid.

  • Use only one socket (stress multiplier) unless you are performing a special type of test. If you want to learn more, see the online Help topic

  • "Stress level vs. stress multiplier." The greater the number of users in the test, the more time it will take to initialize the test. Keeping this number less than 1,000 will help the test run faster. This number, however, is limited only by the RAM in the workstation.

  • Keep the number of scripts items to less than 1,000. RAM is an issue here, too.

Used correctly, WAS is a very cool tool. When you first run it, try to apply stress to the Web site and measure the maximum number of requests per second that the Web server can handle. Then increase the stress and begin to determine which resource prevents the Web server from handling more requests. Once you've figured out where this breaking point is on your server, you will have an idea as to the server's real capacity.

In some situations, you'll find that the processor is the bottleneck. To verify this, watch three counters: System—% Total Processor Time, Web Service:

Connection Attempts/sec, and Active Server Pages—Requests Queued. If the processor is running at greater than 80 percent sustained, it is likely the bottleneck.

If the Requests Queued increases after the processor hits a certain percentage, and memory is relatively low, this is the point at which the processor is entering into a bottleneck state and represents the best functioning of the server given its stress. If the Requests Queued counter fluctuates considerably during the test and if the processor utilization remains relatively low, it indicates that the script is calling a server COM component that is receiving more calls than it can handle. In this case, the server COM component is the bottleneck.

When you conduct this test and use bandwidth throttling, be sure to look at the Internet Information Services Global Object—Current Blocked Async I/O Requests counter. This counter indicates the number of current requests that are blocked due to bandwidth throttling. In a production environment, it would be a very good idea to set an alert on this counter so that you can know when your SPS server is being overworked for a sustained period of time.

WAS and SPS

So far, I've explained how to use this tool to test your Web site. But how do you test the unique functions of SPS in the site? You could hire a developer to write all the paths for you, or you could simply record a session, something that is much easier. When you record a session, you actually create a script that you can then run against the SPS virtual directory in IIS. The first step is to clean out the browser cache on your workstation. You must be using Internet Explorer 3.0 or later to make the recording work properly.

The second step is to set your browser's proxy settings to

localhost 

with a port number of 8000. There's nothing magic about the 8000 port; it's just that it's easy to add two zeros to the port number 80 that already exists in your browser. If you don't use a proxy server, you can skip this step.

Third, in WAS, click the Scripts menu, point to Record, and then choose Create. This will invoke a two-screen wizard, which will ask you what you want to record. Make your selections, and then click Next and then Finish. When you click Finish, the browser will start, and you can type the SPS URL. Then perform the actions you wish to stress.

Note: Because the SPS Web sites are automatically secured using Windows

Integrated Authentication (WIA), you must change the directory security settings to accept Anonymous only. The WAS will not work with the WIA methods when recording or running a script. Therefore, if you don't make this change, when recording the script, you'll be presented with an HTTP 402.1 "You are not authorized to view this page" error message. Changing the directory security settings on the SPS virtual directory and workspaces will allow you access to the Web pages during both the recording and the stress phases.

Figure 12–6 shows the output of a sample script I created using WAS. The important thing to remember when you use this tool is that if you want to stress a certain function in SPS, you must actually perform that function during the recording. For instance, if you want to stress the check-in of a document, you must actually check in a document during the recording phase. If you merely open a document in the browser, the counters in the Document Management object won't do you any good because you haven't actually performed a document check-in. Hence, before recording a script, be sure to understand exactly which actions you wish to stress on your SPS server.

There's no sense in stressing a server if you don't use the data to make decisions. As a test, I decided to stress the document check-out, check-in, and publishing features of my SPS server. After performing these tasks to create my script, I then used the counters shown in Figure 12–7 to measure performance on my SPS server. As you can see, I used a combination of Active Server Pages counters, a memory and processor counter, and the appropriate counters from the SharePoint Portal Server document management object.

When I ran my script, these counters measured how well the DM features were performing on my SharePoint server. I won't show you all my numbers because this test server is only a P/233 with 512MB of RAM. It's a fine test server for writing books, but it's underpowered for any production environment, and hence it's not useful to present these skewed numbers.

After a test is run, the performance results are given in the Perf Counters section using the Report view. Figure 12–8 shows these numbers. Again, remember that these numbers are for illustration only.

Cc750043.awsp1206(en-us,TechNet.10).gif

Figure 12: –6 Sample script generated by WAS

If you run your own test and look at your results, you'll notice that there are 25th, 50th, and 75th percentile ratings as well as a Max number (Figure 12–8). Microsoft uses percentiles as a way of summarizing the data in usable chunks. A percentile number is the value of the data relative to the other data. For example, assume that WAS has 100,000 response time measurements. The minimum measurement is the 0th percentile, and the maximum measurement is the 100th percentile (or Max). Now let's assume that the minimum number is 100 milliseconds and the maximum number is 40,000 milliseconds. Suppose you're looking at one particular measurement of 2,000 milliseconds. What you need to know is how this number relates to the other measurements in the test. Were most measurements near this 2,000 mark, or were most of them higher or lower? The percentiles give you a way to evaluate the hard numbers.

Cc750043.awsp1207(en-us,TechNet.10).gif

Figure 12: –7 Performance Monitor counters in WAS

The 50th percentile gives you the midway point and helps you identify the middle of these numbers. If 18,000 is the 50th percentile in my running example, then 2,000 would be a very fast response. However, if 1,500 milliseconds were the 50th percentile, then 2,000 milliseconds would be considered a bit slow.

For a typical test, the 25th percentile represents the mark at which 25 percent of the measurements were less than the number indicated and 75 percent of the measurements were more. In Figure 12–8, for the Active Server Pages—Request Execution Time counter, the 25th percentile is 511.53 milliseconds. This means that of all the measurements taken, 511.53 is a pretty fast response. Also, the 50th percentile is 640.15 milliseconds, and the average is 680.33 milliseconds. By looking at these two numbers, we can see that of all the measurements taken, the midpoint (or medium in statistics-speak) is 640.15 but the average is 680.33. This tells us that the average is slightly skewed toward the high end, meaning that there were individual, large measurements that pulled the average higher. This means that, at points, the server was much slower than usual, as shown by the Max number, which was 3,455 milliseconds.

With these numbers in my example, we could predict that 50 percent of users will see response times between 411 and 640.15 milliseconds, and another 25 percent of users will see response times as slow as 729.89–3,455 milliseconds.

I encourage you to use the Web Application Stress tool as part of an ongoing method of monitoring your SPS server. When management informs you that another 150 users will be added to your SharePoint server, you now have the tools and knowledge to run tests on the server, stress it, and then give accurate numbers that reflect how your SharePoint server will react to the new stress. Such information could be invaluable and nip potentially large problems in the bud.

Cc750043.awsp1208(en-us,TechNet.10).gif

Figure 12: –8 Report view of completed test on Performance Monitor counters

Summary

This chapter discusses some of the key monitoring counters for SPS and outlines a matrix of counters to measure your server's health. Monitoring can provide you with crucial evidence and data from which to track trends and make predictive evaluations of specific scenarios.

Link
Click to order