How Network Load Balancing Technology Works

Article
10/08/2009

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

In this section

Network Load Balancing Terms and Definitions
Network Load Balancing Architecture
Network Load Balancing Protocols
Application Compatibility with Network Load Balancing
Network Load Balancing for Scalability
Scaling Network Load Balancing Clusters
Performance Impact of Network Load Balancing

When Network Load Balancing is installed as a network driver on each of the member servers, or hosts, in a cluster, the cluster presents a virtual Internet Protocol (IP) address to client requests. The client requests go to all the hosts in the cluster, but only the host to which a given client request is mapped accepts and handles the request. All the other hosts drop the request. Depending on the configuration of each host in the cluster, the statistical mapping algorithm, which is present on all the cluster hosts, maps the client requests to particular hosts for processing.

Basic Diagram for Network Load Balancing Clusters

The following figure shows two connected Network Load Balancing clusters. The first cluster is a firewall cluster with two hosts and the second cluster is a Web server cluster with four hosts.

Two Connected Network Load Balancing Clusters

Two Connected Network Load Balancing Clusters

Network Load Balancing Terms and Definitions

Before you review the Network Load Balancing components and processes, it is helpful to understand specific terminology. The following terms are used to describe the components and processes of Network Load Balancing.

affinity

For Network Load Balancing, the method used to associate client requests to cluster hosts. When no affinity is specified, all network requests are load-balanced across the cluster without respect to their source. Affinity is implemented by directing all client requests from the same IP address to the same cluster host.

convergence

The process of stabilizing a system after changes occur in the network. For routing, if a route becomes unavailable, routers send update messages throughout the network, reestablishing information about preferred routes.

For Network Load Balancing, a process by which hosts exchange messages to determine a new, consistent state of the cluster and to elect the default host. During convergence, a new load distribution is determined for hosts that share the handling of network traffic for specific Transmission Control Protocol (TCP) or User Datagram Protocol (UDP) ports.

dedicated IP address

The IP address of a Network Load Balancing host used for network traffic that is not associated with the Network Load Balancing cluster (for example, Telnet access to a specific host within the cluster). This IP address is used to individually address each host in the cluster and therefore is unique for each host.

default host

The host with the highest host priority for which a drainstop command is not in progress. After convergence, the default host handles all of the network traffic for Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) ports that are not otherwise covered by port rules.

failover

In server clusters, the process of taking resource groups offline on one node and bringing them online on another node. When failover occurs, all resources within a resource group fail over in a predefined order; resources that depend on other resources are taken offline before, and are brought back online after, the resources on which they depend.

heartbeat

A message that is sent at regular intervals by one computer on a Network Load Balancing cluster or server cluster to another computer within the cluster to detect communication failures.

Network Load Balancing initiates convergence when it fails to receive heartbeat messages from another host or when it receives a heartbeat message from a new host.

multicast media access control (MAC) address

A type of media access control address used by multiple, networked computers to receive the same incoming network frames concurrently. Network Load Balancing uses multicast MAC addresses to efficiently distribute incoming network traffic to cluster hosts.

multihomed computer

A computer that has multiple network adapters or that has been configured with multiple IP addresses for a single network adapter.

Network Load Balancing supports multihomed servers by allowing multiple virtual IP addresses to be assigned to the cluster adapter.

throughput

A performance measure defined as the number of client requests processed by a Network Load Balancing cluster per unit of time.

virtual cluster

A Network Load Balancing cluster that you create by assigning specific port rules to specific virtual IP addresses. With virtual clusters, you can use different port rules for different Web sites or applications hosted on the cluster, provided each Web site or application has a different virtual IP address.

virtual IP address

An IP address that is shared among the hosts of a Network Load Balancing cluster. A Network Load Balancing cluster might also use multiple virtual IP addresses, for example, in a cluster of multihomed Web servers.

Network Load Balancing Architecture

Network Load Balancing runs as a network driver logically beneath higher-level application protocols, such as HTTP and FTP. On each cluster host, the driver acts as a filter between the network adapter’s driver and the TCP/IP stack, allowing a portion of the incoming network traffic to be received by the host. This is how incoming client requests are partitioned and load-balanced among the cluster hosts. To maximize throughput and availability, Network Load Balancing uses a fully distributed software architecture, and an identical copy of the Network Load Balancing driver that runs in parallel on each cluster host. The figure below shows the implementation of Network Load Balancing as an intermediate driver in the Windows Server 2003 network stack.

Typical Configuration of a Network Load Balancing Host

Typical Configuration of a NLB Host

The following table describes the principal components related to the Network Load Balancing architecture.

Components of the Network Load Balancing Architecture

Component	Description
Nlb.exe	The Network Load Balancing control program. You can use Nlb.exe from the command line to start, stop, and administer Network Load Balancing, as well as to enable and disable ports and to query cluster status.
Nlbmgr.exe	The Network Load Balancing Manager control program. Use this command to start Network Load Balancing Manager.
Wlbs.exe	The former Network Load Balancing control program. This has been replaced by Nlb.exe. However, you can still use Wlbs.exe rather than Nlb.exe if necessary, for example, if you have existing scripts that reference Wlbs.exe.
Wlbsprov.dll	The Network Load Balancing WMI provider.
Nlbmprov.dll	The Network Load Balancing Manager WMI provider.
Wlbsctrl.dll	The Network Load Balancing API DLL.
Wlbs.sys	The Network Load Balancing device driver. Wlbs.sys is loaded onto each host in the cluster and includes the statistical mapping algorithm that the cluster hosts collectively use to determine which host handles each incoming request.

Application and Service Environment

When a Web server application maintains state information about a client session across multiple TCP connections, it is important that all TCP connections for the client are directed to the same cluster host.

Network Load Balancing can load-balance any application or service that uses TCP/IP as its network protocol and is associated with a specific TCP or UDP port. The distributed algorithm, which is used to determine which host responds to a TCP connection request or incoming UDP packet, can include the port rule in the decision. Including the port rule in the decision means that for any client, different members of the Network Load Balancing cluster may service connection requests or packets addressed to different port rule on the virtual IP address.

Note

While configuring a Network Load Balancing cluster, consider the type of application or service that the server is providing, and then select the appropriate configuration for Network Load Balancing hosts.

Client State

A discussion of Network Load Balancing clusters requires clarification of two kinds of client states, application data state and session state, because they are essential to configuring Network Load Balancing.

Application data state

In terms of application data, you must consider whether the server application makes changes to the data store and whether the changes are synchronized across instances of the application (the instances that are running on the Network Load Balancing cluster hosts). An example of an application that does not make changes to the data store is a static Web page supported by an IIS server.

Means must be provided to synchronize updates to data state that need to be shared across servers. One such means is use of a back-end database server that is shared by all instances of the application. An example would be an Active Server Pages (ASP) page that is supported by an IIS server and that can access a shared back-end database server, such as a SQL Server.

Session state

Session state (or intraclient state) refers to client data that is visible to a particular client for the duration of a session. Session state can span multiple TCP connections, which can be either simultaneous or sequential. Network Load Balancing assists in preserving session state through client affinity settings. These settings direct all TCP connections from a given client address or class of client addresses to the same cluster host. This allows session state to be maintained by the server application in the host memory.

Client/server applications that embed session state within “cookies” or push it to a back-end database do not need client affinity to be maintained.

An example of an application that requires maintaining session state is an e-commerce application that maintains a shopping cart for each client.

Network Load Balancing Parameters

By setting port rules, cluster parameters, and host parameters, you gain flexibility in configuring the cluster, which enables you to customize the cluster according to the various hosts’ capacities and sources of client requests. Cluster parameters apply to the entire cluster, while the host parameters apply to a specific host.

Port rules

The Network Load Balancing driver uses port rules that describe which traffic to load-balance and which traffic to ignore. By default, the Network Load Balancing driver configures all ports for load balancing. You can modify the configuration of the Network Load Balancing driver that determines how incoming network traffic is load-balanced on a per-port basis by creating port rules for each group of ports or individual ports as required. Each port rule configures load balancing for client requests that use the port or ports covered by the port range parameter. How you load-balance your applications is mostly defined by how you add or modify port rules, which you create on each host for any particular port range.

Affinity

Affinity is the method used to associate client requests to cluster hosts. Network Load Balancing assists in preserving session state through client affinity settings for each port rule that Network Load Balancing creates. These settings direct all TCP connections from a given client address or class of client addresses to the same cluster host. Directing the connections to the same cluster host allows the server applications in the designated host memory to correctly maintain the session state.

Running Network Load Balancing in an Optimal Environment

The sections in this document provide in-depth information about how Network Load Balancing works in an optimal environment. An optimal environment for Network Load Balancing is defined as follows:

Two or more network adapters in each cluster host are used.
The Transmission Control Protocol/Internet Protocol (TCP/IP) network protocol is the only protocol used on the cluster adapter. Other protocols, such as Internetwork Packet Exchange (IPX), should not be added to this adapter.
Within a given cluster, all cluster hosts must operate in either unicast or multicast mode, but not both.
Network Load Balancing should not be enabled on a computer that is part of a server cluster.
All hosts in a cluster must belong to the same subnet and the cluster’s clients must be able to access this subnet.
The appropriate hardware resources should be used. The goal in tuning Network Load Balancing and the applications it load-balances is to determine which hardware resource will experience the greatest demand, and then to adjust the configuration to relieve that demand and maximize total throughput.
Cluster parameters, port rules, and host parameters are correctly configured:
- Cluster parameters and port rules for each unique virtual IP address are identical across all hosts. Each unique virtual IP address must be configured with the same port rules across all hosts that service that virtual IP address. However, if you have multiple virtual IP addresses configured on a host, each of those virtual IP addresses can have a different set of port rules.
- Port rules are set for all ports used by the load-balanced application. For example, FTP uses port 20, port 21, and ports 1024–65535.
- The dedicated IP address is unique and the cluster IP address is added to each cluster host.
Network performance should be optimized by limiting switch port flooding.
Windows Internet Name Service (WINS), Dynamic Host Configuration Protocol (DHCP), and Domain Name System (DNS) services can be run on Network Load Balancing cluster hosts; however, the interface to which Network Load Balancing is bound cannot use a DHCP address.

Network Load Balancing Driver

The Network Load Balancing driver is installed on a computer configured with Transmission Control Protocol/Internet Protocol (TCP/IP) and is bound to a single network interface called the cluster adapter. The driver is configured with a single IP address, the cluster primary IP address, on a single subnet for all of the member servers (or hosts) on the cluster. Each host has an identical media access control (MAC) address that allows the hosts to concurrently receive incoming network traffic for the cluster’s primary IP address (and for additional IP addresses on multihomed hosts).

Incoming client requests are partitioned and load-balanced among the cluster hosts by the Network Load Balancing driver, which acts as a rule-based filter between the network adapter’s driver and the TCP/IP stack. Each host receives a designated portion of the incoming network traffic.

Distributed Architecture

Network Load Balancing is a distributed architecture, with an instance of the driver installed on each cluster host. Throughput is maximized to all cluster hosts by eliminating the need to route incoming packets to individual cluster hosts, through a process called filtering. Filtering out unwanted packets in each host improves throughput; this process is faster than routing packets, which involves receiving, examining, rewriting, and resending the packets.

Another key advantage to the fully distributed architecture of Network Load Balancing is the enhanced availability resulting from (n-1) way failover in a cluster with n hosts. In contrast, dispatcher-based solutions create an inherent single point of failure that you must eliminate by using a redundant dispatcher that provides only one-way failover. Dispatcher-based solutions offer a less robust failover solution than does a fully distributed architecture.

Load Balancing Algorithm

The Network Load Balancing driver uses a fully distributed filtering algorithm to statistically map incoming client requests to the cluster hosts, based upon their IP address, port, and other information.

When receiving an incoming packet, all hosts within the cluster simultaneously perform this mapping to determine which host should handle the packet. Those hosts not required to service the packet simply discard it. The mapping remains constant unless the number of cluster hosts changes or the filter processing rules change.

The filtering algorithm is much more efficient in its packet handling than centralized load balancers, which must modify and retransmit packets. Efficient packet handling allows for a much higher aggregate bandwidth to be achieved on industry standard hardware.

The distribution of client requests that the statistical mapping function effects is influenced by the following:

Host priorities
Multicast or unicast mode
Port rules
Affinity
Load percentage distribution
Client IP address
Client port number
Other internal load information

The statistical mapping function does not change the existing distribution of requests unless the membership of the cluster changes or you adjust the load percentage.

Unicast and Multicast Modes

Network Load Balancing requires at least one network adapter; and different hosts in a cluster can have a different number of adapters, but all must use the same network IP transmission mode, either unicast or multicast.

Unicast mode is the default mode, but you can configure the Network Load Balancing driver to operate in either mode.

When you enable unicast support, the unicast mode changes the cluster adapter’s MAC address to the cluster MAC address. This cluster address is the same MAC address that is used on all cluster hosts. When this change is made, clients can no longer address the cluster adapters by their original MAC addresses.

When you enable multicast support, Network Load Balancing adds a multicast MAC access to the cluster adapters on all of the cluster hosts. At the same time, the cluster adapters retain their original MAC addresses.

Note

The Network Load Balancing driver does not support a mixed unicast and multicast environment. All cluster hosts must be either multicast or unicast; otherwise, the cluster will not function properly.

If clients are accessing a Network Load Balancing cluster through a router when the cluster has been configured to operate in multicast mode, be sure that the router meets the following requirements:

Accepts an Address Resolution Protocol (ARP) reply that has one MAC address in the payload of the ARP structure but appears to arrive from a station with another MAC address, as judged by the Ethernet header.
Accepts an ARP reply that has a multicast MAC address in the payload of the ARP structure.

If your router does not meet these requirements, you can create a static ARP entry in the router. For example, some routers require a static ARP entry because they do not support the resolution of unicast IP addresses to multicast MAC addresses.

Subnet and Network Considerations

The Network Load Balancing architecture with a single MAC address for all cluster hosts maximizes use of the subnet’s hub and/or switch architecture to simultaneously deliver incoming network traffic to all cluster hosts.

Your network configuration will typically include routers, but may also include layer 2 switches (collapsed backbone) rather than the simpler hubs or repeaters that are available. Cluster configuration, when using hubs, is predictable because the hubs distribute IP traffic to all ports.

Note

If client-side network connections at the switch are significantly faster than server-side connections, incoming traffic can occupy a prohibitively large portion of server-side port bandwidth.

Network Adapters

Network Load Balancing requires only a single network adapter, but for optimum cluster performance, you should install a second network adapter on each Network Load Balancing host. In this configuration, one network adapter handles the network traffic that is addressed to the server as part of the cluster. The other network adapter carries all of the network traffic that is destined to the server as an individual computer on the network, including cluster communication between hosts.

Note

Network Load Balancing with a single network adapter can provide full functionality if you enable multicast support for this adapter.

Selecting an IP Transmission Mode

When you are implementing a Network Load Balancing solution, the Internet Protocol transmission mode that is selected and the number of network adapters that are required are dependent upon the following network requirements:

Layer 2 switches or hubs
Peer-to-peer communication between hosts
Maximized communication performance

For example, a cluster supporting a static Hypertext Markup Language (HTML) Web application can have a requirement to synchronize the Web site copies of a large number of cluster hosts. This scenario requires interhost peer-to-peer communications. You select the number of network adapters and the IP communications mode to meet this requirement.

There is no restriction on the number of network adapters, and different hosts can have a different number of adapters. You can configure Network Load Balancing to use one of four different models.

Single Network Adapter in Unicast Mode

The single network adapter in unicast mode is suitable for a cluster in which you do not require ordinary network communication among cluster hosts, and in which there is limited dedicated traffic from outside the cluster subnet to specific cluster hosts. In this model, the computer can also handle traffic from inside the subnet if the IP datagrams do not carry the same MAC address as the cluster adapter.

Single Network Adapter in Multicast Mode

This model is suitable for a cluster in which ordinary network communication among cluster hosts is necessary or desirable, but in which there is limited dedicated traffic from outside the cluster subnet to specific cluster hosts.

Multiple Network Adapter in Unicast Mode

This model is suitable for a cluster in which ordinary network communication among cluster hosts is necessary or desirable, and in which there is comparatively heavy dedicated traffic from outside the cluster subnet to specific cluster hosts.

This mode is the preferred configuration used by most sites because a second network adapter may enhance overall network performance.

Multiple Network Adapter in Multicast Mode

This model is suitable for a cluster in which ordinary network communication among cluster hosts is necessary, and in which there is heavy dedicated traffic from outside the cluster subnet to specific cluster hosts.

Comparison of Modes

The advantages and disadvantages of each model are listed in the following table.

Components of the Network Load Balancing Architecture

Adapter	Mode	Advantages	Disadvantages
Single	Unicast	Simple configuration	Poor overall performance
Single	Multicast	Medium performance	Complex configuration
Multiple	Unicast	Best balance	None
Multiple	Multicast	Best balance	Complex configuration

Network Load Balancing Addressing

The Network Load Balancing cluster is assigned a primary Internet Protocol (IP) address. This IP address represents a virtual IP address to which all of the cluster hosts respond, and the remote control program that is provided with Network Load Balancing uses this IP address to identify a target cluster.

Primary IP address

The primary IP address is the virtual IP address of the cluster and must be set identically for all hosts in the cluster. You can use the virtual IP address to address the cluster as a whole. The virtual IP address is also associated with the Internet name that you specify for the cluster.

Dedicated IP address

You can also assign each cluster host a dedicated IP address for network traffic that is designated for that particular host only. Network Load Balancing never load-balances the traffic for the dedicated IP addresses, it only load-balances incoming traffic from all IP addresses other than the dedicated IP address.

The following figure shows how IP addresses are used to respond to client requests.

Network Load Balancing Cluster

Network Load Balancing Cluster

Distribution of Cluster Traffic

When the virtual IP address is resolved to the station address (MAC address), this MAC address is common for all hosts in the cluster. You can enable client connections to only the required cluster host when more packets are sent. The responding host then substitutes a different MAC address for the inbound MAC address in the reply traffic. The substitute MAC address is referred to as the Source MAC address. The following table shows the MAC addresses that will be generated for a cluster adapter.

IP Mode	MAC Address	Explanation
Unicast inbound	02-BF-W-X-Y-Z	W-X-Y-Z = IP address Onboard MAC disabled
Multicast inbound	03-BF-W-X-Y-Z	W-X-Y-Z = IP address Onboard MAC enabled
Source outbound	02-P-W-X-Y-A	W-X-Y-Z = IP address P = Host priority

Unicast inbound

02-BF-W-X-Y-Z

W-X-Y-Z = IP address

Onboard MAC disabled

Multicast inbound

03-BF-W-X-Y-Z

W-X-Y-Z = IP address

Onboard MAC enabled

Source outbound

02-P-W-X-Y-A

W-X-Y-Z = IP address

P = Host priority

In the unicast mode of operation, the Network Load Balancing driver disables the onboard MAC address for the cluster adapter. You cannot use the dedicated IP address for interhost communications because all of the hosts have the same MAC address.

In multicast mode of operation, the Network Load Balancing driver supports both the onboard and the multicast address. If your cluster configuration will require connections from one cluster host to another, for example, when making a NetBIOS connection to copy files, use multicast mode or install a second network adapter.

If the cluster hosts were attached to a switch instead of a hub, the use of a common MAC address would create a conflict because Layer 2 switches expect to see unique source MAC addresses on all switch ports. To avoid this problem, Network Load Balancing uniquely modifies the source MAC address for outgoing packets, for example, a cluster MAC address of 02-BF-1-2-3-4 is set to 02-p-1-2-3-4, where p is the host’s priority within the cluster.

This technique prevents the switch from learning the cluster’s inbound MAC address, and as a result, incoming packets for the cluster are delivered to all of the switch ports. If the cluster hosts are connected to a hub instead of to a switch, you can disable masking of the source MAC address in unicast mode to avoid flooding upstream switches. You disable Network Load Balancing by setting the Network Load Balancing registry parameter MaskSourceMAC to 0. The use of an upstream level three switch will also limit switch flooding.

The unicast mode of Network Load Balancing induces switch flooding to simultaneously deliver incoming network traffic to all of the cluster hosts. Also, when Network Load Balancing uses multicast mode, switches often flood all of the ports by default to deliver multicast traffic. However, the multicast mode of Network Load Balancing gives the system administrator the opportunity to limit switch flooding by configuring a virtual LAN within the switch for the ports corresponding to the cluster hosts.

Port Rules

Port rules are created for individual ports and for groups of ports that Network Load Balancing requires for particular applications and services. The filter setting then defines whether the Network Load Balancing driver will pass or block the traffic.

The Network Load Balancing driver controls the distribution and partitioning of Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) traffic from connecting clients to selected hosts within a cluster by passing or blocking the incoming data stream for each host. Network Load Balancing does not control any incoming IP traffic other than TCP and UDP for ports that a port rule specifies.

You can add port rules or update parameters by taking each host out of the cluster in turn, updating its parameters, and then returning it to the cluster. The host joining the cluster handles no traffic until convergence is complete. The cluster does not converge to a consistent state until all of the hosts have the same number of rules. For example, if a rule is added, it does not take effect until you have updated all of the hosts have been updated and they have rejoined the cluster.

Note

Internet Group Membership Protocol (IGMP), ARP, or other IP protocols are passed unchanged to the TCP/IP protocol software on all of the hosts within the cluster.

Port rules define individual ports or groups of ports for which the driver has a defined action. You need to consider certain parameters when creating the port rules, such as the:

TCP or UDP port range for which you should apply this rule.
Protocols for which this rule should apply (TCP, UDP, or both).
Filtering mode chosen: multiple hosts, single host, or disabled.

When defining the port rules, it is important that the rules be exactly the same for each host in the cluster because if a host attempts to join the cluster with a different number of rules from the other hosts, the cluster will not converge. The rules that you enter for each host in the cluster must have matching port ranges, protocol types, and filtering modes.

Filtering Modes

The filter defines for each port rule whether the incoming traffic is discarded, handled by only one host, or distributed across multiple hosts. The three possible filtering modes that you can apply to a port rule are:

Multiple hosts

With this filtering mode, clusters can equally distribute the load among the hosts or each host can handle a specified load weight.

Single host

This filtering mode provides fault tolerance for the handling of network traffic with the target host defined by its priority.

Disabled

This filtering mode lets you build a firewall against unwanted network access to a specific range of ports; the driver discards the unwanted packets.

Load Weighting

When the filter mode for a port rule is set to multiple hosts, the Load Weight parameter specifies the percentage of load-balanced network traffic that this host should handle for the associated rule. Allowed value ranges are from 0 to 100.

Note

To prevent a host from handling any network traffic for a port rule, set the load weight to 0.

Because hosts can dynamically enter or leave the cluster, the sum of the load weights for all cluster hosts does not have to equal 100. The percentage of host traffic is computed as the local load percentage value divided by the load weight sum across the cluster.

If you balance the load evenly across all of the hosts with this port rule, you can specify an equal load distribution parameter instead of specifying a load weight parameter.

Priority

When the filter mode for a port rule is set to single, the priority parameter specifies the local host’s network traffic for the associated port rule. The host with the highest handling priority for this rule among the current cluster members will handle all of the traffic.

The allowed values range from 1, the highest priority, to the maximum number of hosts allowed, 32. This value must be unique for all hosts in the cluster.

Supporting Multiple Client Connections

In a load-balanced multiserver environment, managing and resolving client, application, and session state for individual clients can be complex. By default, in a Network Load Balancing solution, different hosts in the cluster can service multiple client connections.

When a client creates an initial connection to a host in the cluster, the application running on this host holds the client state. If the same host does not service subsequent connections from the client, errors can occur if the application instances do not share the client state between hosts.

For example, application development for an ASP-based Web site can be more difficult if the application must share the client state among the multiple hosts in the cluster. If all of the client connections can be guaranteed to go to the same server, you can solve the difficulties with the application that is not sharing the client state among host instances.

By using a Network Load Balancing feature called affinity, you can ensure that the same cluster host handles all of the TCP connections from one client IP address. Affinity allows you to scale applications that manage session state spanning multiple client connections. In a Network Load Balancing cluster, with affinity enabled, initial client connection requests are distributed according to the cluster configuration, but after you have established the initial client request the same host will service all of the subsequent requests from that client.

Affinity

Clients can have many TCP connections to a Network Load Balancing cluster; the load-balancing algorithm will potentially distribute these connections across multiple hosts in the cluster.

If server applications have client or connection state information, this state information must be made available on all of the cluster hosts to prevent errors. If you cannot make state information available on all of the cluster hosts, you cannot use client affinity to direct all of the TCP connections from one client IP address to the same cluster host. Directing TCP connections from the IP address to the same host allows an application to maintain state information in the host memory.

For example, if a server application (such as a Web server) maintains state information about a client’s site navigation status that spans multiple TCP connections, it is critical that all of the TCP connections for this client state information be directed to the same cluster host to prevent errors.

Affinity defines a relationship between client requests from a single client address or from a Class C network of clients (where IP addresses range from 192.0.0.1 to 223.255.255.254) and one of the cluster hosts. Affinity ensures that requests from the specified clients are always handled by the same host. The relationship lasts until convergence occurs (namely, until the membership of the cluster changes) or until you change the affinity setting. There is no time-out — the relationship is based only on the client IP address.

You can distribute incoming client connections based on the algorithm as determined by the following client affinity settings:

No Affinity

This setting distributes client requests more evenly, when maintaining session state is not an issue, you can use this setting to speed up response time to requests. For example, because multiple requests from a particular client can go to more than one cluster host, clients that access Web pages can get different parts of a page or different pages from different hosts. This setting is used for most applications.

With this setting, the Network Load Balancing statistical mapping algorithm uses both the port number and entire IP address of the client to influence the distribution of client requests.

Single Affinity

When single affinity is used, the entire source IP address (but not the port number) is used to determine the distribution of client requests.

You typically set single affinity for intranet sites that need to maintain session state. This setting always returns each client’s traffic to the same server, thus assisting the application in maintaining client sessions and their associated session state.

Note that client sessions that span multiple TCP connections (such as ASP sessions) are maintained as long as the Network Load Balancing cluster membership does not change. If the membership changes by adding a new host, the distribution of client requests is recomputed, and you cannot depend on new TCP connections from existing client sessions ending up at the same server. If a host leaves the cluster, its clients are partitioned among the remaining cluster hosts when convergence completes, and other clients are unaffected.

Class C Affinity

When Class C affinity is used, only the upper 24 bits of the client’s IP address are used by the statistical-mapping algorithm. A Class C unicast IP address ranges from 192.0.0.1 to 223.255.255.254. The first three octets indicate the network, and the last octet indicates the host on the network. Network Load Balancing provides optional session support for Class C IP addresses (in addition to support for single IP addresses) to accommodate clients that make use of multiple proxy servers at the client site. Class-based IP addressing has been superceded by Classless Interdomain Routing (CIDR). This option is appropriate for server farms that serve the Internet. Client requests coming over the Internet might come from clients sitting behind proxy farms. In this case, during a single client session, client requests can come into the Network Load Balancing cluster from several source IP addresses during a session.

Class C affinity addresses this issue by directing all the client requests from a particular Class C network to a single Network Load Balancing host.

There is no guarantee, however, that all of the servers in a proxy farm are on the same Class C network. If the client’s proxy servers are on different Class C networks, then the affinity relationship between a client and the server ends when the client sends successive requests from different Class C network addresses.

Heartbeats and Convergence

Network Load Balancing cluster hosts exchange heartbeat messages to maintain consistent data about the cluster’s membership. By default, when a host fails to send out heartbeat messages within five seconds, it is deemed to have failed. Once a host has failed, the remaining hosts in the cluster perform convergence and do the following:

Establish which hosts are still active members of the cluster.
Elect the host with the highest priority as the new default host.
Ensure that all new client requests are handled by the surviving hosts.

In convergence, surviving hosts look for consistent heartbeats. If the host that failed to send heartbeats once again provides heartbeats consistently, it rejoins the cluster in the course of convergence. When a new host attempts to join the cluster, it sends heartbeat messages that also trigger convergence. After all cluster hosts agree on the current cluster membership, the client load is redistributed to the remaining hosts, and convergence completes.

The following figure shows how the client load is evenly distributed among four cluster hosts before convergence takes place:

Network Load Balancing Cluster Before Convergence

Network Load Balancing Cluster Before Convergence

The following figure shows a failed host and how the client load is redistributed among the three remaining hosts after convergence.

Network Load Balancing Cluster After Convergence

Network Load Balancing Cluster After Convergence

Convergence generally only takes a few seconds, so interruption in client service by the cluster is minimal. During convergence, hosts that are still active continue handling client requests without affecting existing connections. Convergence ends when all hosts report a consistent view of the cluster membership and distribution map for several heartbeat periods.

By editing the registry, you can change both the number of missed messages required to start convergence and the period between heartbeats. However, be aware that making the period between heartbeats too short increases network overhead on the system. Also be aware that reducing the number of missed messages increases the risk of erroneous host evictions from the cluster.

Note

Incorrectly editing the registry may severely damage your system. Before making changes to the registry, you should back up any valued data on the computer.

Network Load Balancing Protocols

Network Load Balancing can load-balance any application or service that uses TCP/IP as its network protocol and is associated with a specific Transmission Control Protocol (TCP) or User Datagram Protocol (UDP) port. For applications to work with Network Load Balancing, they must use TCP connections or UDP data streams.

You can also enable Internet Group Management Protocol (IGMP) support on the cluster hosts to control switch flooding when operating in multicast mode. IGMP is a protocol used by Internet Protocol version 4 (IPv4) hosts to report their multicast group memberships to any immediately neighboring multicast routers.

Application Compatibility with Network Load Balancing

In general, Network Load Balancing can load-balance any application or service that uses TCP/IP as its network protocol and is associated with a specific TCP or UDP port.

Compatible Applications

Compatible applications must have the following requirements in order to work with Network Load Balancing:

TCP connections or UDP data streams.
Synchronization of data. If client data that resides on one of the hosts changes as a result of a client transaction, applications must provide a means of synchronizing data updates if that data is shared on multiple instances across the cluster.
Single or Class C affinity. If session state is an issue, applications must use single or Class C affinity or provide a means (such as a client cookie or reference to a back-end database) of maintaining session state in order to be uniformly accessible across the cluster.

As highlighted in the previous two bullets, before using Network Load Balancing, you must consider an application’s requirements regarding client state. Network Load balancing can accommodate both stateless and stateful connections. However, certain factors must be considered for a stateful connection.

Application servers maintain two kinds of stateful connections:

Interclient state. A state whose data updates must be synchronized with transactions that are performed for other clients, such as merchandise inventory at an e-commerce site. Because any host within the cluster can service client traffic, information that must be persistent across multiple requests or that must be shared among clients needs to be shared in a location that is accessible from all cluster hosts. Updates to information that is shared among the hosts needs to be synchronized, for example by using a back-end database server.
Intraclient state. A state that must be maintained for a given client throughout a session (that can span multiple connections), such as a shopping cart process at an e-commerce site.

Network Load Balancing can be used with applications that maintain either type of state, although applications requiring interclient state are typically best accommodated with server clusters rather than Network Load Balancing. While intraclient state can be accommodated through the Network Load Balancing affinity settings, interclient state, on the other hand, requires additional components outside of Network Load Balancing.

Interclient State

Some applications require interclient state, that is, the applications make changes to data that must be synchronized across all instances of the application. For example, any type of application that uses Microsoft SQL Server for data storage, such as a user registration database, requires interclient state. When applications do share and change data (require interclient state), the changes need to be properly synchronized. Each host can use local, independent copies of databases that are merged offline as necessary. Alternatively, the clustered hosts can share access to a separate, networked database server.

A combination of these approaches can also be used. For example, static Web pages can be replicated among all clustered servers to ensure fast access and complete fault tolerance. However, database requests would be forwarded to a common database server that handles updates for the multiple cluster hosts.

Network Load Balancing is a viable solution for applications that require interclient state.

Intraclient State

A second kind of state intraclient state (also known as session state), refers to client data that is visible to a particular client for the duration of a session. Session state can span multiple TCP connections, which can be either simultaneous or sequential. Network Load Balancing assists in preserving session state through client affinity settings. These settings direct all TCP connections from a given client address or group of client addresses to the same cluster host. This allows session state to be maintained by the server application in the host memory.

Incompatible Applications

Applications that are not compatible with Network Load Balancing have one or more of the following characteristics:

They bind to actual computer names (examples of such applications are Exchange Server and Distributed File System).
They have files that must be continuously open for writing (for example, Exchange Server). In a Network Load Balancing cluster, multiple instances of an application (on different cluster hosts) should not have a file simultaneously opened for writing unless the application was designed to synchronize file updates. This is generally not the case.

Network Load Balancing for Scalability

Network Load Balancing scales the performance of a server-based program, such as a Web server, by distributing its client requests across multiple identical servers within the cluster; you can add more servers to the cluster as traffic increases. Up to 32 servers are possible in any one cluster. The following figure represents how additional servers can support more users.

How Servers Scale Out Across a Cluster

How Servers Scale Out Across a Cluster

You can improve the performance of each individual host in a cluster by adding more or faster CPUs, network adapters and disks, and in some cases by adding more memory. These additions to the Network Load Balancing cluster is called scaling up, and requires more intervention and careful planning than scaling out. Limitations of applications or the operating system configuration could mean that scaling up by adding more memory may not be as appropriate as scaling out.

You can handle additional IP traffic by simply adding computers to the Network Load Balancing cluster as necessary. Load balancing, in conjunction with the use of server clusters, is part of a scaling approach referred to as scaling out. The greater the number of computers involved in the load-balancing scenario, the higher the throughput of the overall server cluster.

Scaling Network Load Balancing Clusters

Network Load Balancing clusters have a maximum of 32 hosts, and all of the hosts must be on the same subnet. If a cluster cannot meet the performance requirements of a clustered application, such as a Web site, because of a host count or subnet throughput limitation, then you can use multiple clusters to scale out further.

Combining round robin DNS and Network Load Balancing results in a very scalable and highly available configuration. Configuring multiple Network Load Balancing clusters on different subnets and configuring DNS to sequentially distribute requests across multiple Network Load Balancing clusters can evenly distribute the client load that is distributed across several clusters. When multiple Network Load Balancing Web server clusters are configured with round robin DNS, the Web servers are made resilient to networking infrastructure failures.

When you use round robin DNS in conjunction with Network Load Balancing clusters, each cluster is identified in DNS by the cluster virtual IP. Because each cluster is automatically capable of both load balancing and fault tolerance, each DNS-issued IP address will function until all hosts in that particular cluster fail. Round robin DNS enables only a limited form of TCP/IP load balancing for IP-based servers when used without Network Load Balancing. When used with multiple individual hosts, such as Web servers, round robin DNS does not function effectively as a solution. If a host fails, round robin DNS continues to route requests to the failed server until the server is removed from DNS.

Performance Impact of Network Load Balancing

The performance impact of Network Load Balancing can be measured in four key areas:

CPU overhead on the cluster hosts, which is the CPU percentage required to analyze and filter network packets (lower is better).
Response time to clients, which increases with the non-overlapped portion of CPU overhead, called latency (lower is better).
Throughput to clients, which increases with additional client traffic that the cluster can handle prior to saturating the cluster hosts (higher is better).
Switch occupancy, which increases with additional client traffic (lower is better) and must not adversely impact port bandwidth.

In addition, scalability determines how its performance improves as hosts are added to the cluster. Scalable performance requires that CPU overhead and latency not grow faster than the number of hosts.

CPU Overhead

All load-balancing solutions require system resources to examine incoming packets and make load-balancing decisions, and thus impose an overhead on network performance. As previously noted, dispatcher-based solutions examine, modify, and retransmit packets to particular cluster hosts. (They usually modify IP addresses to retarget packets from a virtual IP address to a particular host’s IP address.) In contrast, Network Load Balancing independently delivers incoming packets to all cluster hosts and applies a filtering algorithm that discards packets on all but the desired host. Filtering imposes less overhead on packet delivery than re-routing, which results in lower response time and higher overall throughput.

Throughput and Response Time

Network Load Balancing scales performance by increasing throughput and minimizing response time to clients. When the capacity of a cluster host is reached, it cannot deliver additional throughput, and response time grows non-linearly as clients awaiting service encounter queuing delays. Adding another cluster host enables throughput to continue to climb and reduces queuing delays, which minimizes response time. As customer demand for throughput continues to increase, more hosts are added until the network’s subnet becomes saturated. At that point, throughput can be further scaled by using multiple Network Load Balancing clusters and distributing traffic to them using Round Robin DNS.

Switch Occupancy

The filtering architecture for Network Load Balancing relies on the broadcast subnet of the LAN to deliver client requests to all cluster hosts simultaneously. In small clusters, this can be achieved using a hub to interconnect cluster hosts. Each incoming client packet is automatically presented to all cluster hosts. Larger clusters use a switch to interconnect cluster hosts, and, by default, Network Load Balancing induces switch-flooding to deliver client requests to all hosts independently. It is important to ensure that switch-flooding does not use an excessive amount of switch capacity, especially when the switch is shared with computers outside the cluster. The percentage of switch bandwidth consumed by flooding of client requests is called its switch occupancy.

How Network Load Balancing Technology Works

Basic Diagram for Network Load Balancing Clusters

Network Load Balancing Terms and Definitions

affinity

convergence

dedicated IP address

default host

failover

heartbeat

multicast media access control (MAC) address

multihomed computer

throughput

virtual cluster

virtual IP address

Network Load Balancing Architecture

Application and Service Environment

Client State

Application data state

Session state

Network Load Balancing Parameters

Port rules

Affinity

Running Network Load Balancing in an Optimal Environment

Network Load Balancing Driver

Distributed Architecture

Load Balancing Algorithm

Unicast and Multicast Modes

Subnet and Network Considerations

Network Adapters

Selecting an IP Transmission Mode

Single Network Adapter in Unicast Mode

Single Network Adapter in Multicast Mode

Multiple Network Adapter in Unicast Mode

Multiple Network Adapter in Multicast Mode

Comparison of Modes

Network Load Balancing Addressing

Primary IP address

Dedicated IP address

Distribution of Cluster Traffic

Port Rules

Filtering Modes

Multiple hosts

Single host

Disabled

Load Weighting

Priority

Supporting Multiple Client Connections

Affinity

No Affinity

Single Affinity

Class C Affinity

Heartbeats and Convergence

Network Load Balancing Protocols

Application Compatibility with Network Load Balancing

Compatible Applications

Interclient State

Intraclient State

Incompatible Applications

Network Load Balancing for Scalability

Scaling Network Load Balancing Clusters

Performance Impact of Network Load Balancing

CPU Overhead

Throughput and Response Time

Switch Occupancy

Additional resources