Advances in Windows TCP/IP Networking

Significant advances have been made, both in hardware and in software, over the past few years that have enabled gigabit networking. From being unavailable just a few years back, it has become mainstream technology with GigE NICs available in an increasing number of clients and servers today. The ever increasing networking capacity and the need for high throughput over long distance transfers makes it imperative that we develop software which functions efficiently while taking advantage of all the available bandwidth. There are several bottlenecks that prevent high performance transfers end-to-end: host system on the sender and receiver, the network and of course the applications themselves. For Windows networking stack to deliver high throughput in gigabit networks and beyond, we have to alleviate some or all of these bottlenecks.

 

Let’s try and analyze some of these bottlenecks.

 

CPU bottleneck: Sending and receiving data at gigabit speeds requires a tremendous amount of processing: to compute checksums, form TCP/IP headers, validate received packets, copy data across buffers, and so on. Doing all this processing in software has significant processing demands with most of the CPU consumed in just sending and receiving data even with the fastest available off the shelf processors. Technologies like task offload have helped alleviate some of CPU load by offloading tasks like checksum computation. However, the CPU continues to be the bottleneck.

 

Scalability bottleneck: Due to physical limitations, scalability in processing power has been made possible by using multiple processors or multi-core processors. Most servers ship with multiple processors these days. However, the networking stack has always supported processing of received packets on a single processor only. This implies that for servers with significant network load, performance is limited to the processing power of one CPU and does not scale even when more processors are added.

 

TCP congestion control: the TCP protocol was designed in late 1980s when the bandwidths were fairly low. Principles of congestion control were developed to ensure that the end systems cooperate and do not cause congestion collapse on the network. TCP has scaled amazingly well to several hundred Mbps bandwidth as well as to millions of host using it across the Internet. However, as bandwidth continues to increase, inherent limitations in TCP are becoming evident. TCP was designed to be quite conservative in probing for spare bandwidth but quite aggressive to responding to any congestion indication. Congestion indications are derived based on packet losses. This imposes a theoretical limit on the throughput of TCP connection for a given loss rate. This is problematic for high bandwidth scenarios e.g. In data intensive grids and networks for high energy and nuclear physics, led by CERN, and projects like Terraservice for astronomy research, there is a need to move huge amounts of data across high bandwidth links. TCP faces challenges in scaling to such bandwidths.

 

TCP receive window limitations: Applications that use TCP are limited in throughput by the maximum default receive window. Traditionally, different implementations of TCP have used a maximum default receive window value of 64KB. (The receive window field in the TCP header is a 16-bit field.) This effectively limits the amount of data the sender can send, thereby directly impacting the maximum throughput achievable on the connection. Although it is possible to change this value in the registry on Windows, it is really not easy to guess what the correct value is for a given connection. The optimal receive window size varies by connection and it is usually dependent upon the bandwidth-delay product and the application’s consumption capacity, none of which can be pre-configured in the registry.

 

Application bottlenecks: For gigabit throughputs, applications must be optimized so as to minimize overhead in sending and receiving data. They must also ensure that there is always enough data to send to keep the pipes full, and enough buffers posted to receive incoming data in time. Today this requires a lot of manual tuning, something that’s clearly not an option as we go forward.

 

In the coming days and weeks we will discuss the advances made in the networking stack in Windows Vista and for the Longhorn Server to solve some of these problems.

 

Please stay tuned…

 

Murari Sridharan

TCP/IP Networking, Internet Protocols Team