TCP Chimney Offload – Possible Performance and Concurrency Impacts to SQL Server Workloads

TCP Chimney is enabled by default if you apply Windows Server 2003 Sp2.  This is an operating system feature that provides capability to offload TCP/IP packet processing from the processor to the network adapters and some other balancing options.  (For a full description of this feature see http://support.microsoft.com/kb/912222.)

TCP Chimney has been known to cause issues on SQL Server systems such as general network errors and working set trimming.  The following articles document these known issues:

http://support.microsoft.com/kb/942861

http://support.microsoft.com/kb/918483

We’ve also identified situations where TCP Chimney has impacted transaction throughput and caused delays between when a statement has been completed by the SQL engine and the time to receive the begin event of the next statement.  This impact can be significant especially in application workloads that have throughput requirements to execute a series of statements within a certain time boundary. 

For example, your application has a key transaction that consists of multiple statements.  Each individual statement on the engine side is optimized and has very short duration.  The overall duration of the transaction is short because each statement has low duration and the time in between the batches is short as well.  A profiler trace of this transaction typically shows a pattern like the following.  Note that there is very short time in between the complete of one batch and the start of the next batch:

image

However with TCP Chimney enabled, you notice there is a marked delay between a batch completed and the start of the next batch for the exact same series of statements and work.  In this example, note how there is approximately a 500 ms. delay in between the complete and start of the next batch:

image

In this scenario with the 500 ms. delay in between statements you would see the SPID spend most of its time awaiting command in sys.sysprocesses with a waittype of 0x000.

This type of delay can affect application throughput as well as concurrency.  For example if the above statements are all encompassed in an implicit transaction, with the added delay the overall duration of the implicit transaction is significantly increased, locks would then be held longer than normal and you may see unexpected blocking.  If you do a comparison test of the same implicit transaction between two systems, one with TCP Chimney enabled and the other with TCP Chimney disabled and you compare the sum of the duration of the individual statements vs. the total duration of the entire transaction, you may see that the overall transaction is significantly increased when TCP Chimney is enabled.  With TCP Chimney enabled, the delta between the sum of the statement duration from the overall transaction duration shows that the majority of time is spent awaiting the next batch/command. 

Here is an example comparison of the same workload with TCP Chimney enabled and disabled.  Note the significant increase in transaction duration and the large delta (difference between transaction duration vs. the sum duration of all statements within transaction) when TCP Chimney is enabled:

Implicit Transaction Summary TCP Chimney Enabled

spid TransactionID TranStart TranEnd TranDuration sum_batch_duration batch_count delta
------- -------------- ------------- ------------ ------------- -------------------- -------------- --------
57 916972 09:40:24.450 09:41:17.623 53173 601 516 52572
57 896243 09:39:31.620 09:40:01.840 30220 322 301 29898
57 877227 09:39:12.120 09:39:15.293 3173 306 161 2867
57 876313 09:38:58.590 09:38:58.603 13 0 1 13
57 895388 09:39:18.510 09:39:18.527 16 16 4 0
57 915675 09:40:02.653 09:40:02.670 16 16 4 0

Implicit Transaction Summary TCP Chimney Disabled

spid TransactionID TranStart TranEnd TranDuration sum_batch_duration batch_count delta
------- -------------- ------------ ------------ ------------- -------------------- -------------- --------
54 127910 11:13:47.287 11:13:52.490 5203 4060 516 1143
54 107344 11:13:23.380 11:13:24.427 1046 382 301 664
51 87187 11:12:50.067 11:12:50.550 483 0 1 483
54 88182 11:13:03.987 11:13:07.237 3250 2878 161 372
51 106432 11:13:10.487 11:13:10.487 0 0 1 0
54 126550 11:13:25.490 11:13:26.007 516 516 4 0

 

If you observe a similar pattern and suspect TCP Chimney, you may want to disable TCP Chimney to provide immediate relief.  Another option is to follow up with your network adapter vendor to see if they have an updated driver that will address the problem and allow for use of TCP Chimney.  For additional information see  http://support.microsoft.com/default.aspx?scid=kb;EN-US;948496

 

TCP Chimney is off by default in Windows Server 2008 - see http://support.microsoft.com/kb/951037.

Sarah Henwood | Microsoft SQL Server Escalation Services