Why Smart IT Organizations Use Performance Baselines
Written by Kip Ng, Principal Premier Field Engineer
This is one of those topics (like “Do you have a well-tested Disaster Recovery plan?”) that many of you may look at it and say “yes, yes, I know we need it”. Let me give you some of the answers I got from some of my customers when I asked them if they have a Performance Baseline:
- “We thought of doing that when we first started implemented Exchange 3 years ago, but we never get to it.”
- “We don’t have the resource to do it? “
- “Microsoft Systems Center Operation Manager captures that, no?”
- “Do we need it?”
- “Hahaha… yeah… we know we should have it.”
Frankly, out of the many customers that I have been to; I can say roughly 20% of them will have some sort of performance baseline established. So, let me keep it simple before I even go further, what is performance baseline?
What is a performance baseline?
A performance baseline is sometimes referred as Performance Benchmarking. It’s about understanding and documenting how the system or servers behave in your environment under normal circumstances. Those benchmarks will then be made as a standard or reference and used as Performance baseline monitoring to detect any unexpected event. It can also be used for comparing the performance of a new server or an unknown server.
Do you need it?
Needless to say, yes. It is needed because you should:
1. Know your environment
Every environment has a different set of performance baselines because of different hardware, different design, different topology, different applications installed, different number of users, difference in system usage and various other factors. As a result, every system may have a different set of performance profiles. For that reason, you can’t just take performance baseline from anyone and apply it to your environment.
You can use some of the thresholds published by vendors as a guideline for threshold monitoring but you should still have your own performance baseline so that you know your environment performance profile.
2. Know what is normal and what is abnormal
How can you tell if your servers are behaving normally when you do not have a reference on what is considered normal? If the server is running at 80% CPU utilization during day time, is that normal? If your answer is “I think so” it means you don’t know, and that you should have a performance baseline established. Perhaps the norm is actually 45% CPU utilization instead of 80%. If you have the reference to compare, then you can tell when something’s amiss. Perhaps a specific user over-stressing the server by performing huge query.
What about a new server? You commission a new server and put 5,000 users on it and it is running at 80% CPU utilization and using 3GB memory during peak hour. Is that normal? It may be for 5,000 users and hence you don’t need to do anything.
3. Know when it’s time to scale
A smart IT team will plan for the future and plan for adding new server when needed. The question is when is it time to add a new server? Is it purely based on users? I can put 10,000 users on one server but if the concurrent users using the service is only 50, then why would I need to add another server? We need to know the performance baseline so that we have a reference point that we can compare and know when it is time to scale.
Here is how Microsoft worded it and I think there is no better way to conclude this post, so I’m just going to quote it:
Baseline performance monitoring involves establishing a performance baseline for your system. A performance baseline includes a single performance chart accompanied by an interpretation of the results, based on your environment. Many elements of the chart, such as the timeframe, vary according to the environment. You can use System Monitor in Windows Server 2003 to establish your performance baseline. The System Monitor chart can be created in real time or based on a performance log file. It is recommended that you base your chart on a log file, because this allows you to record statistics for an extended period.