Link State Propagation
To guarantee efficient and reliable message routing, Exchange servers must have up-to-date information in their link state table. This information must accurately reflect the state of all bridgehead servers and messaging connectors. To propagate link state information to all servers in an Exchange organization, a propagation protocol known as link state algorithm (LSA) is used.
Propagating link state information among all servers has the following advantages:
Each Exchange server can select the optimum message route at the source instead of sending messages along a route on which a connector is unavailable.
Messages no longer bounce back and forth between servers, because each Exchange server has current information about whether alternate or redundant routes are available.
Message looping no longer occurs.
Link State Algorithm
The propagation of link state information differs within and between routing groups. Within routing groups, reliable TCP/IP connectivity is assumed, and servers communicate with each other over direct TCP/IP connections. Across routing groups, however, direct TCP/IP connections might not be possible. Across routing groups, Exchange Server 2003 propagates link state information through SMTP or X.400.
Exchange Server 2003 propagates link state information as follows:
Intra-routing group LSA Within a routing group, the routing group master tracks link state information and propagates it to the remaining servers in the routing group. The remaining servers are also named member nodes or routing group members. When a member node is started and has initialized its routing table with information from Active Directory, it establishes a TCP/IP connection to port 691. It then authenticates with the routing group master and obtains most recent information about the state of all links in the routing topology. All intra-routing group connections require two-way authentication. The connection remains so that master and subordinate node can communicate with each other whenever link state changes occur.
Master and subordinate in a routing group
Within a routing group, Exchange Server 2003 updates link state information as follows:
When the advanced queuing engine or the Exchange MTA determines a problem with a bridgehead or routing group connector, it informs the local routing engine, as explained in "Message Rerouting Based on Link State Information" in Exchange Server 2003 Message Routing.
The local routing engine, acting as a caching proxy between the routing group master and the advanced queuing engine or Exchange MTA, forwards the link state information to the routing group master over the link state connection to TCP port 691.
When the routing group master is informed of an update, it overwrites the link state table with the new information. Based on this new information, the routing group master creates a new MD5 hash, inserts it into the link state table, and then propagates the new information to all servers in the routing group. Again, communication takes place over TCP port 691.
An MD5 hash is a cryptographic block of data derived from a message by using a hashing algorithm that generates a 128-bit hash from a list of blocks with 512 bits. The same message always produces the same hash value when the message is passed through the same hashing algorithm. Messages that differ by even one character can produce very different hash values.
The routing group master sends the whole link state table (that is, the OrgInfo packet) to each routing group member. Each routing group member compares the MD5 hash of the new OrgInfo packet with the MD5 hash in its own link state table and determines if the local server has the most up-to-date information.
If the MD5 values are different, the routing group member processes the OrgInfo packet. After replacing the link state table in memory, the routing group member sends a short reply to the routing group master, now also referencing the new MD5 hash value.
The routing group master processes this information, discovers that the routing group member is updated, and sends a short acknowledgment to the routing group member.
Every five minutes thereafter, the routing group member polls the master to query for up-to-date routing information. Master and member node compare their MD5 hash values to determine if changes occurred.
All servers within a routing group must communicate with the routing group master through a reliable TCP/IP connection.
Inter-routing group LSA Link state information is communicated indirectly between routing groups, using bridgehead servers and routing group connectors. To send link state information to another routing group, the routing group master communicates the link state information in the form of an Orginfo packet, which it sends to the routing group's bridgehead server over TCP port 691. The bridgehead server then forwards this information to all the bridgehead servers in other routing groups to which it connects, using the various routing group connectors it hosts.
If the communication between routing groups is SMTP-based (that is, Routing Group Connector or SMTP connector), link state information is exchanged before regular message transfer by using the extended SMTP command, X-LINK2STATE, as follows:
The source bridgehead server establishes a TCP/IP connection to the destination bridgehead over TCP port 25.
The bridgehead servers authenticate each other using the X-EXPS GSS API command.
After connecting and authenticating, link state communication begins using the X-LINK2STATE command.
First, the bridgehead servers compare their MD5 hashes to detect any changes to link state information. Then the local bridgehead server uses the DIGEST_QUERY verb to request the MD5 hash from the remote bridgehead server. The DIGEST_QUERY verb contains the GUID of the Exchange organization and the MD5 hash of the local bridgehead server.
The remote bridgehead server now compares its MD5 hash to the MD5 hash received through the DIGEST_QUERY verb. If the hashes are the same, the remote bridgehead server sends a DONE_RESPONSE verb to indicate that the link state table does not require updating. Otherwise, the remote bridgehead server sends its entire OrgInfo packet.
After receiving the OrgInfo packet, the remote and local bridgehead servers reverse roles and the local bridgehead server sends its own OrgInfo packet to the remote bridgehead server. Both bridgehead servers transfer the received OrgInfo packet to their routing group masters. The routing group master determines whether to update the link state table with the information from the OrgInfo packet. A higher version number indicates a more recent OrgInfo packet.
Routing group masters never accept information about their local routing group from a routing group master in a remote routing group.
After the exchange of OrgInfo packets, the remote bridgehead server starts transferring e-mail messages, or issues a Quit command to end the SMTP connection.
For details about SMTP communication between servers running Exchange Server 2003, see SMTP Transport Architecture.
When you link routing groups by means of an X.400 connector, link state information is exchanged between the MTAs as part of typical message transmission. A binary object, called the Orginfo packet, is sent in a system message to the receiving MTA before interpersonal messages are transferred. The receiving MTA then transfers the Orginfo packet to the local routing engine, which communicates the transfer to the routing group master.
An LSA Example
The following figure illustrates how the link state algorithm works in an Exchange organization that contains multiple routing groups. The figure illustrates an environment that contains an unavailable bridgehead server in routing group E. Also, the bridgehead servers in the other routing groups have not received the information that there is a routing problem.
An organization with an unavailable bridgehead server, before link state changes
Exchange Server 2003 discovers the routing problem in the following way:
A user in routing group A sends a message to a recipient in routing group E.
The routing engine chooses the path shown in Figure 5.9. Therefore, the message is transferred to the bridgehead server in routing group B.
The bridgehead server in routing group B tries a direct transfer to the bridgehead server in routing group E. Because the remote bridgehead is unavailable, the try fails. After three consecutive connection tries, the routing group connector's local bridgehead server is marked as CONN_NOT_AVAIL. Because there are no more bridgeheads in the connector configuration, the connector is marked as STATE DOWN.
First connector down
The bridgehead server in routing group B connects to its routing group master through TCP port 691 and transmits the new link state information. The master incorporates the information into the link state table and notifies all servers in the routing group about the change.
The link state change causes a rerouting event in routing group B. Exchange Server 2003 can select from two paths with the same cost values. In this example, the message is sent to routing group C, because the routing engine randomly chooses this transfer path.
Before the actual message is transferred, the bridgehead servers in routing group B and routing group C compare their MD5 hashes. Because the MD5 hashes do not match, the servers exchange link state information. The bridgehead server in routing group B informs its remaining adjacent remote bridgehead servers (routing groups A, C, and D) about the link state changes.
The bridgehead server in routing group C connects to its routing group master through TCP port 691 and transmits new link state information. The routing group master incorporates the information in the link state table and notifies all servers in the routing group about the change. All servers in routing group B and C now know that the routing group connector between routing group B and routing group E is unavailable.
The bridgehead server in routing group C tries a direct transfer to the bridgehead server in routing group E. Because the remote bridgehead is unavailable, the connection try fails. After three connection tries, the connector is marked as STATE DOWN.
Second connector down
The bridgehead server in routing group C connects to its routing group master through TCP port 691 and transmits new link state information. The routing group master incorporates the information in the link state table and notifies all other servers in the routing group about the change.
The link state change causes a rerouting event in routing group C. The message is sent now to routing group D, because the routing engine still sees an available transfer path from routing group D to routing group E. The bridgehead server in routing group C informs its remaining adjacent remote bridgehead servers (routing groups A, B and D) about the link state changes.
The message is transferred to routing group D, but before the actual message transfer, the bridgehead servers in routing group B and C compare their MD5 hashes and exchange link state information.
The bridgehead server in routing group D connects to its routing group master through TCP port 691 and transmits new link state information. The routing group master incorporates the information into the link state table and notifies all servers in the routing group about the change. All servers in routing group D now know that the routing group connectors between routing groups B and E and routing groups C and E are unavailable.
The bridgehead server in routing group D tries a direct transfer to the bridgehead server in routing group E, but because the remote bridgehead is unavailable, the connection try fails. After three connection tries, the connector is marked as STATE DOWN.
Third connector down
The bridgehead server in routing group D connects to its routing group master through TCP port 691 and transmits new link state information. The master incorporates the information into the link state table and notifies all servers in the routing group about the change.
The link state change causes a rerouting event in routing group D. Because no additional transfer paths are available to routing group E, the message remains in routing group D, until at least one transfer path is available. The message is transferred to routing group E as soon as the bridgehead server in routing group E is available.
The bridgehead server in routing group D informs its remaining adjacent remote bridgehead servers (routing groups B and C) about the link state changes. These routing groups then propagate the link state changes to routing group A.
Link State Changes and Link State Propagation
The link state table contains version information for each routing group in the form of major, minor, and user version numbers. Major version changes have highest priority, followed by minor changes, and changes to user version numbers.
Exchange Server 2003 detects link state changes in the following way:
Major version number Major changes are actual physical changes in the routing topology. For example, you create a major change when you add a new connector to the routing group or change a connector configuration. To receive notification of major changes to its routing group in the routing topology, the routing group master registers with Active Directory for change notifications using DSAccess. The configuration domain controller sends these notifications to the Exchange server, according to the standard Lightweight Directory Access Protocol (LDAP) change notification process. When a routing group master receives an update to the routing topology from the configuration domain controller, it sends the updated information to all member servers in its routing group. It also notifies all bridgehead servers in remote routing groups, as explained earlier in the section "Link State Algorithm." For more information about the role of DSAccess and the configuration domain controller on Exchange 2003, see Exchange Server 2003 and Active Directory.
Minor version number Minor changes are changes in link state information, such as a connector changing from a STATE UP to STATE DOWN. Unreliable network connections, however, could lead to a situation in which connectors are frequently marked up and down, which causes extra link state updates across the Exchange organization. A substantial increase in processing overhead may occur, because of extra route resets and message rerouting. By default, Exchange Server 2003 mitigates oscillating connectors by delaying link state changes for a period of ten minutes. During this period, changes that occur are consolidated and then replicated across the organization in one batch. However, an oscillating connection can still generate link state traffic if changes occur for extended periods of time.
You can increase or decrease the update window through the following registry parameter.
Interval in seconds between link state updates. Default is ten minutes. The maximum is seven days. Setting this parameter to 0 can be useful when troubleshooting connection failures. Failures are then immediately reflected on connector states.
You can also prevent the routing group master from marking down its connectors by setting the following Registry key. This can be helpful, especially in hub-and-spoke routed scenarios, in which each destination can be reached only through a single connector. Message rerouting cannot occur if alternate connectors are not available.
A value of 0x1 disables link state changes.
User version number User updates include minimal changes, such as when the routing group master changes, when services are started or stopped on an Exchange server, when another server is added to the routing group, or when a member server loses connectivity to the routing group master.
Changing the Routing Group Master
The first server installed in the routing group is automatically designated as the routing group master. If this server fails or is taken offline, link state information is no longer propagated in the routing group. All servers in the routing group continue to operate on the earlier information. When the routing group master is available again, it reconstructs its link state information. The routing group master begins with all servers and connectors marked as unavailable. It then discovers any unavailable servers and updates members within the routing group.
If you shut down a routing group master for more than a brief time, you should nominate a different routing group master to avoid inefficient message routing. In Exchange System Manager, expand the desired routing group and select the Members container. In the details pane, right-click the server that you want to promote to the routing group master, and then select Set as Master.
Changing the routing group master represents a major link state change. In a link state change, link state information is propagated across the organization, and all Exchange servers must reroute their messages. Therefore, do not change the routing group master frequently.
Conflicts Between Routing Group Masters
Only one server is recognized in a routing group as the routing group master. This configuration is enforced by an algorithm in which (N/2) +1 servers in the routing group must agree and acknowledge the master. N denotes the number of servers in the routing group. Therefore, the member nodes send link state ATTACH data to the master.
Sometimes, two or more servers mistake the wrong server as the routing group master. For example, if a routing group master is moved or deleted without choosing another routing group master, msExchRoutingMasterDN, the attribute in Active Directory that designates the routing group master, might point to a deleted server, because the attribute is not linked.
This situation can also occur when an old routing group master refuses to detach as master, or a rogue routing group master continues to send link state ATTACH information to an old routing group master. In Exchange Server 2003, if msExchRoutingMasterDN points to a deleted object, the routing group master relinquishes its role as master and initiates a shutdown of the master role.
Take the following steps to resolve this issue:
Check for healthy link state propagation in the routing group on port 691. Verify that a firewall or SMTP filter is not blocking communication.
Verify that no Exchange service is stopped.
Check Active Directory for replication latencies, using the Active Directory Replication Monitor tool (Replmon.exe), which is included in Microsoft Windows Server 2003.
Check for network problems and network communication latencies.
Check for deleted routing group masters or servers that no longer exist. In these instances, a transport event 958 is logged in the application log of Event Viewer. This event states that a routing group master no longer exists. Verify this information by using a directory access tool, such as LDP (ldp.exe) or ADSI Edit (adsiEdit.msc). These applications are included in the Windows Server 2003 support tools.