Thinking about Timeout and Retry Policy

I kind of glossed over the subject of timeouts before although I did call them out as an extra tricky point of the object model.  Timeouts don't just help with performance; they also play a role in the usability and security of channels.  That makes it important to understand how timeouts are used and how to choose a good timeout value.

The first rule of timeouts is that any operation that is potentially lengthy must have a timeout.  Methods that don't take a timeout should delegate to versions that do take a timeout.  Methods that don't take a timeout and can't delegate to another version simply must never, ever do any work that can block.  Otherwise, you have a weak point that is going to lower the reliability of your application and can make you vulnerable to Denial of Service attacks.

In CommunicationObject, the Open() and Close() methods use timeout values that come from the appropriately named DefaultOpenTimeout and DefaultCloseTimeout properties.  The implementation of Open() is just to call Open(DefaultOpenTimeout) and similarly for Close().  The OnOpening() and OnOpened() methods have no equivalent with a timeout.  That means they must not block!  Do anything that could potentially block in OnOpen() instead.

The second rule of timeouts is that they are the total limit for an operation's run time inclusive of any retry attempts.

Bad scenario: Your Open() operation is called with a timeout of 60 seconds.  You pass that value to your OpenSocket function, which you'll then retry 4 more times in a loop.  Don't do this!  If each call to OpenSocket gets its own timeout, then you could spend in excess of 5 minutes in this function.

Worse scenario: Someone retries your Open function 3 more times in a loop.  The retries multiply together and now you have to wait 20 minutes.

Just right scenario: You measure that the first call to OpenSocket failed after 25 seconds.  You then retry OpenSocket, but this time you give it a timeout of 60 - 25 = 35 seconds.  Eventually, you'll either hit your maximum number of times to retry or you'll reduce your timeout to 0 seconds.  Even if retries are nested within retries, you have a tight bound on the total operation time.

This leads to a very easy rule for using timeouts: just worry about the time you can afford for high-level operations.  Everyone else will simply spend whatever time they have left.

Next time: The Ties that Bind Us, Part 1: BindingElement