What is KeepAliveTime used for in regards to Exchange?

This one setting has caused more debates and more confusion, and by proxy, more cases than perhaps any other thing in our Exchange Client Connectivity queues since I started working at Microsoft. There is a lot of confusion on what it is actually used for, so I am going to try and clear that up here.

First of all, where is it?

The KeepAliveTime is a setting that lives in the registry of your Windows Servers at the following location:

\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

It is a Dword value and its data in decimal is in milliseconds. If it is not present under that key, that means that it is set to the default, which is 7,200,000ms or 2 hours.

What does it do?

KeepAliveTime does nothing on its own. If a process uses it, however, a heartbeat will be sent at it's configured interval to the connected client over TCP to ensure the client is still there and responding. If it isn't, then it will close the connection. If the client responds, it will wait the pre-set amount of time and then do it again.

What is the purpose of that?

There are a finite number of connections available to Exchange. If every connected client held their connections open indefinitely, then Exchange would run out of connections. So it is preferred that when Exchange detects the client has closed down, or dropped from the network, that Exchange close that connection and free it up for another client.

It also serves another purpose. When an Exchange client, like Outlook, or a mobile device, opens a connection to Exchange, it is basically saying, "Hey, I am open on this port and I will remain open on this port until something closes it".  Exchange will do the same saying "ok, I will use a port on my side and reserve it for your system to talk to mine". So both of those ports remain open and available to each other until Exchange no longer needs it, or the client disconnects. There is a misconception that there is an actual virtual tunnel there but in reality, it is just two open ports reserved for each others' use.

The client will then use that connection to talk to a Client Access Server in Exchange. It fully expects that everything it does on Exchange will be done over that connection. Now there are things between the client and that connection that have the ability to close the connection as well. Things like Firewalls, MDM solutions, Load balancers, etc. Each of these will have a timeout setting on them with an ominous name like TCP Timeout, TCP TTL (time to live), or Idle Session Timeout. I will be calling all of these TCP Timeout for simplicity sake. What these do is they monitor connections established through them over TCP and if nothing happens on the connection in a certain amount of time, they terminate it. Exchange does not like it when that happens to one of its connections.

Exchange expects that connection to remain open until it is done with it. If it sends a process to the client, or if the client sends one to it, depending on the process, it can take some time for a response. If the Network appliance times out the connection before the response comes in, it causes all kinds of inconvenient issues such as Outlook authentication prompts, IPhone full resyncs, and delayed responses. Basically, Exchange and the client have to reestablish their connection before it can continue, which means you have to authenticate again, or in the case of the IPhone, all of your emails disappear and then slowly start uploading to the device again.

KeepAliveTime, when configured properly, prevents those timeouts from occurring so long as the application is set up to make use of them. Outlook, IPhone, and many other applications come configured that way out of the box. The idea is that the KeepAliveTime heartbeat will interrupt the idle TCP session, making it no longer idle for a few seconds,  and thereby cause any TCP timeouts to have to start their countdown over again.

So if your firewall has a 30 minute TCP Timeout, but your KeepAliveTime heartbeat fires off every 15 minutes, the TCP Timeout will never trigger, because the session isn't idle for more than 15 minutes ever and Exchange will be able to hold that connection open until either the client goes offline, or the Exchange server finishes using it.

However, if your KeepAliveTime is at the default of 2 hours, and the Firewall has a TCP timeout set at 30 minutes, your KeepAliveTime heartbeat will never have the opportunity to fire as the session will have closed an hour and a half earlier.

Our recommendation is that you set the KeepAliveTime to 30 minutes, or 1,800,000ms.

Any Network appliances such as Firewalls, MDM solutions and Load balancers should have their timeout values set to higher than 30 minutes, preferably progressively higher from the Exchange server out to the egress point. So for instance, if your Client connects through a firewall, then through an MDM solution, and then a Load Balancer, and then finally to the Exchange Client Access Server, you would want it to look something like this:

Firewall (1Hour TCP Timeout)->MDM Solution (50minutes TCP Timeout)->Load Balancer (40minutes TCP Timeout)->CAS Server (KeepAliveTime 30 minutes)

With this setup, the connections will remain open as long as they are being used and cannot be interrupted prematurely with maybe the exception of a network outage or hardware failure.

If I change it, do I need to reboot?

Absolutely. The KeepAliveTime setting is not an Exchange setting. It is a network stack setting in the OS, so the setting will not take effect until a reboot is completed.

Is KeepAliveTime case sensitive?

This is a big question and there are various opinions on it. Until I can get the F5 load balancer configured in my lab (which may be a while), I am going to suggest you do it the way it is documented in all of the Microsoft KB articles that mention it. None of them tell you that it is case sensitive, but all of them present it as upper case K, A and T. KeepAliveTime. If you do it that way, and it winds up that it is case sensitive for some applications and not for others, it will still work unhindered. It won't be rejected if the application is not case aware if you have it as KeepAliveTime and I have yet to run into an application that rejected it when it was set that way. However, I have worked cases where we seem to get a different behavior when it is in the correct case than when it isn't. Full disclosure though, there may have been other mitigating factors, which is why I ultimately want to prove or disprove it in my lab. I will update this blog should that ever happen.

That's pretty much everything I know about the KeepAliveTime value, I hope you find this information useful.