Performance – what do we mean in regards to SAP workload?

Article
01/24/2010

Today we received an email that came from a Microsoft consultant who is working with a SAP customer moving the SAP landscape to Windows Server 2008. With this upgrade of the Operating System, the customer also moved a newer model of application server into the SAP ERP system. All application servers, except the new one were 4-socket Dual-Core AMD Opteron based servers. So more or less we are looking at 3-4 year old servers. The customer added a more recent version of a 4-socket Quad-Core Intel based server. So basically a server which is around 18 month in age and has double the number of cores in comparison to the other servers. The old servers are running on Windows Server 2003, the new one runs on Windows Server 2008. The customer did very diligent stress tests and started to complain quite a bit to our consultant. The customer was disappointed about the response times on the new server. According to ST03 those response times were even a touch worse than on the older servers with Dual-Core processors. In a first shot the customers saw Windows Server 2008 as the root cause for the fact that the response times measured in ST03 were not meeting expectations.
The explanation to this question is a classical one. It has to do with the metrics for standard benchmarks we are executing. The metrics always circle around number of transactions we execute and number of users, eventually considering a maximum response time. That is what usually gets published as ‘performance’. Reality is we are publishing throughput measures of a server. These describe the speed a processor core can execute a request only indirectly. Throughput of transactions or users simplified is a result of the number of cores or CPUs and the speed those CPUs or cores execute a request. In an ideal case, the result of doubling the # of processor cores or CPUs should result in doubling the throughput assuming that the speed of processing per core or CPU is the same. Sure there are scalability factors which usually end up giving a throughput increase which is more on the 1.8-1.9 level assuming the same speed of processing a request. But so much to the theory on the hardware. What about the SAP application side?
Let’s assume payroll calculation of a single employee. This will be a dialog step in a SAP dialog workprocess. The SAP workprocess will execute this process single threaded. Means the OS at any point in time can schedule this process only on one processor core or CPU. This is opposite to parallel query execution of a single query which Database systems can provide. In those cases a query gets partitioned and executed in parallel using multiple CPUs or processors cores. In the granularity of a SAP dialog step something like parallel execution of that dialog step is not possible. Hence the response time in the execution of this dialog step is solely dependent on the speed a processor core or CPU shows. The slower it is in processing the higher the response time will be.
Let’s apply all this to the customer case mentioned above. Result of the Benchmark was that the newer Quad-Core server was resulting in just 80% more throughput than the old Dual-Core servers. Hence we need to assume that the per core processing speed of these two different processors as best are on the same level, if not even a touch slower on the newer Quad-Core processors. This is more or less what the customer did get evidence about, when reading the data out of ST03. What ST03 unfortunately didn’t tell was the fact that the overall CPU resource consumption on the new server was drastically lower than on the old servers. However in order to capitalize on those, the customer would have needed to load the server higher. But purely looking at the response times per dialog step didn’t reveal the truth behind the better ‘performance’ aka throughput of the new server. It all depends how we define ‘performance’.
So if we want to make predictions on response time impact by new servers we always need to look up the benchmarks and correlate the throughput with the # of CPUs or processor cores. This will give a good idea on whether we can expect better response times or just better throughput.