6: Transient Fault Handling

Retired Content

This content and the technology described is outdated and is no longer being maintained. For more information, see Transient Fault Handling.

patterns & practices Developer Center

On this page:
What Are Transient Faults? | What Is the Transient Fault Handling Application Block? | Historical Note | Using the Transient Fault Handling Application Block | Adding the Transient Fault Handling Application Block to Your Visual Studio Project | Instantiating the Transient Fault Handling Application Block Objects | Defining a Retry Strategy | Defining a Retry Policy | Executing an Operation with a Retry Policy | When Should You Use the Transient Fault Handling Application Block? | You are Using an Azure Service | You Are Using a Custom Service | More Information

What Are Transient Faults?

When cloud-based applications use other cloud-based services, errors can occur because of temporary conditions such as intermittent service, infrastructure-level faults, or network issues. Very often, if you retry the operation a short time later (maybe only a few milliseconds later) the operation may succeed. These types of error conditions are referred to as transient faults. Transient faults typically occur very infrequently, and in most cases, only a few retries are necessary for the operation to succeed.

Unfortunately, there is no easy way to distinguish transient from non-transient faults; both would most likely result in exceptions being raised in your application. If you retry the operation that causes a non-transient fault (for example a "file not found" error), you most likely get the same exception raised again.

There is no intrinsic way to distinguish between transient and non-transient faults unless the developer of the service explicitly isolated transient faults into a specified subset of exception types or error codes.

For example, with SQL Azure™ technology platform, one of the important considerations is how you should handle client connections. This is because SQL Azure can use throttling when a client attempts to establish connections to a database or run queries against it. SQL Azure throttles the number of database connections for a variety of reasons, such as excessive resource usage, long-running transactions, and possible failover and load balancing actions. This can lead to the termination of existing client sessions or the temporary inability to establish new connections while the transient conditions persist. SQL Azure can also drop database connections for a variety of reasons related to network connectivity between the client and the remote data center: quality of network, intermittent network faults in the client's LAN or WAN infrastructure and other transient technical reasons.

Hh680901.note(en-us,PandP.50).gifBharath Says:
Bharath
                Throttling can occur with the Microsoft Azure™ technology platform storage if your client exceeds the scalability targets. For more information, see "<a href="https://go.microsoft.com/fwlink/?linkid=234633">Azure Storage Abstractions and their Scalability Targets</a>." </td>

What Is the Transient Fault Handling Application Block?

The Transient Fault Handling Application Block makes your application more robust by providing the logic for handling transient faults. It does this in two ways.

First, the block includes logic to identify transient faults for a number of common cloud-based services in the form of detection strategies. These detection strategies contain built-in knowledge that is capable of identifying whether a particular exception is likely to be caused by a transient fault condition.

Hh680901.note(en-us,PandP.50).gifBharath Says:
Bharath
                Determining which exceptions are the result of transient faults for a service requires detailed knowledge of and experience using the service. The block encapsulates this kind of knowledge and experience for you.</td>

The block includes detection strategies for the following services:

  • SQL Azure
  • Azure Service Bus
  • Azure Storage Service
  • Azure Caching Service

Second, the application block enables you to define your retry strategies so that you can follow a consistent approach to handling transient faults in your applications. The specific retry strategy you use will depend on several factors; for example, how aggressively you want your application to perform retries, and how the service typically behaves when you perform retries. Some services can further throttle or even block client applications that retry too aggressively. A retry strategy defines how many retries you want to make before you decide that the fault is not transient, and what the intervals should be between the retries.

Hh680901.note(en-us,PandP.50).gifJana Says:
Jana
                This kind of retry logic is also known as "conditional retry" logic.</td>

The built-in retry strategies allow you to specify that retries should happen at fixed intervals, at intervals that increase by the same amount each time, and at intervals that increase exponentially but with some random variation. The following table shows examples of all three strategies.

Retry strategy

Example (intervals between retries in seconds)

Fixed interval

2,2,2,2,2,2

Incremental intervals

2,4,6,8,10,12

Random exponential back-off intervals

2, 3.755, 9.176, 14.306, 31.895

Note

All retry strategies specify a maximum number of retries after which the original exception is allowed to bubble up to your application.

In many cases, you should use the random exponential back-off strategy to gracefully back off the load on the service. This is especially true if the service is throttling client requests.

Hh680901.note(en-us,PandP.50).gifBharath Says:
Bharath
                High throughput applications should typically use an exponential back-off strategy. However, for user-facing applications such as websites you may want to consider a linear back-off strategy to maintain the responsiveness of the UI.</td>

You can define your own custom detection strategies if the built-in detection strategies included with the application block do not meet your requirements. The application block also allows you to define your own custom retry strategies that define additional patterns for retry intervals.

Hh680901.note(en-us,PandP.50).gifMarkus Says:
Markus
                In many cases, retrying immediately may succeed without the need to wait. By default, the block performs the first retry immediately before using the retry intervals defined by the strategy.</td>

Figure 1 illustrates how the key elements of the Transient Fault Handling Application Block work together to enable you to add the retry logic to your application.

Follow link to expand image

Figure 1

The Transient Fault Handling Application Block

A retry policy combines a detection strategy with a retry strategy. You can then use one of the overloaded versions of the ExecuteAction method to wrap the call that your application makes to one of the services.

Hh680901.note(en-us,PandP.50).gifJana Says:
Jana
                You must select the appropriate detection strategy for the service whose method you are calling from your Azure application.</td>

Historical Note

The Transient Fault Handling Application Block is a product of the collaboration between the Microsoft patterns & practices team and the Azure Customer Advisory Team. It is based on the initial detection and retry strategies, and the data access support from the "Transient Fault Handling Framework for SQL Azure, Azure Storage, Service Bus & Cache." The new application block now includes enhanced configuration support, enhanced support for wrapping asynchronous calls, provides integration of the application block's retry strategies with the Azure storage retry mechanism, and works with the Enterprise Library dependency injection container. The new Transient Fault Handling Application Block supersedes the Transient Fault Handling Framework and is now the recommended approach to handling transient faults in Azure applications.

Using the Transient Fault Handling Application Block

This section describes, at a high-level, how to use the Transient Fault Handling Application Block. It is divided into the following main subsections. The order of these sections reflects the order in which you would typically perform the associated tasks.

  • Adding the Transient Fault Handling Application Block to your Visual Studio Project. This section describes how you can prepare your Microsoft Visual Studio® development system solution to use the block.
  • Defining a retry strategy. This section describes the ways that you can define a retry strategy in your application.
  • Defining a retry policy. This section describes how you can define a retry policy in your application.
  • Executing an operation with a retry policy. This section describes how to execute actions with a retry policy to handle any transient faults.

Note

A retry policy is the combination of a retry strategy and a detection strategy. You use a retry policy when you execute an operation that may be affected by transient faults.

For more examples of how you can use the Transient Fault Handling Application Block in your Azure application, see Chapter 7, "Making Tailspin Surveys More Resilient."

For detailed information about configuring the Transient Fault Handling Application Block and writing code that uses the Transient Fault Handling Application Block, see the topic "The Transient Fault Handling Application Block" on MSDN®.

Adding the Transient Fault Handling Application Block to Your Visual Studio Project

As a developer, before you can write any code that uses the Transient Fault Handling Application Block, you must configure your Visual Studio project with all of the necessary assemblies, references, and other resources that you'll need. For information about how you can use NuGet to prepare your Visual Studio project to work with the Transient Fault Handling Application Block, see the topic "Adding the Transient Fault Handling Application Block to your Solution" on MSDN.

Hh680901.note(en-us,PandP.50).gifMarkus Says:
Markus
                NuGet makes it very easy for you to configure your project with all of the prerequisites for using the Transient Fault Handling Application Block.</td>

Instantiating the Transient Fault Handling Application Block Objects

There are two basic approaches to instantiating the objects from the application block that your application requires. In the first approach, you can explicitly instantiate all the objects in code, as shown in the following code snippet:

var retryStrategy = new Incremental(5, TimeSpan.FromSeconds(1), 
  TimeSpan.FromSeconds(2));

var retryPolicy =
  new RetryPolicy<StorageTransientErrorDetectionStrategy>(retryStrategy);

Note

If you instantiate the RetryPolicy object using new, you cannot use the default strategies defined in the configuration.

In the second approach, you can use the Enterprise Library infrastructure to instantiate and manage the objects for you as shown in the following code snippet:

var retryManager = EnterpriseLibraryContainer.Current.GetInstance<RetryManager>();

var retryPolicy = retryManager.GetRetryPolicy
  <StorageTransientErrorDetectionStrategy>("Incremental Retry Strategy");

There is an additional approach that is provided for backward compatibility with the "Transient Fault Handling Application Framework" that uses the RetryPolicyFactory class:

var retryPolicy = RetryPolicyFactory.GetRetryPolicy
  <StorageTransientErrorDetectionStrategy>("Incremental Retry Strategy");

Defining a Retry Strategy

There are three considerations in defining retry strategies for your application: which retry strategy to use, where to define the retry strategy, and whether to use default retry strategies.

In most cases, you should use one of the built-in retry strategies: fixed interval, incremental, or random exponential back off. You configure each of these strategies using custom sets of parameters to meet your application's requirements; the parameters specify when the strategy should stop retrying an operation, and what the intervals between the retries should be. The choice of retry strategy will be largely determined by the specific requirements of your application. For more details about the parameters for each retry strategy, see the topic "Source Schema for the Transient Fault Handling Application Block" on MSDN.

You can define your own custom retry strategy. For more information, see the topic "Implementing a Custom Retry Strategy" on MSDN.

You can define your retry policies either in code or in the application configuration file. Defining your retry policies in code is most appropriate for small applications with a limited number of calls that require retry logic. Defining the retry policies in configuration is more useful if you have a large number of operations that require retry logic, because it makes it easier to maintain and modify the policies.

For more information about how to define your retry strategy in code, see the topic "Specifying Retry Strategies in Code" on MSDN.

For more information about how to define your retry strategies in a configuration file, see the topic "Specifying Retry Strategies in the Configuration" on MSDN.

If you define your retry strategies in the configuration file for the application, you can also define default retry strategies. The block allows you to specify default retry strategies at two levels. You can specify a default retry strategy for each of the following operation categories: SQL connection operations, SQL command operations, Azure Service Bus operations, Azure caching, and Azure storage operations. You can also specify a global default retry strategy.

Defining a Retry Policy

A retry policy is the combination of a retry strategy and a detection strategy that you use when you execute an operation that may be affected by transient faults. The RetryManager class includes methods that enable you to create retry policies by explicitly identifying the retry strategy and detection strategy, or by using default retry strategies defined in the configuration file.

Hh680901.note(en-us,PandP.50).gifMarkus Says:
Markus If you are using Azure storage and you are already using the retry policies mechanism in the Microsoft.WindowsAzure.StorageClient namespace, then you can use retry strategies from the application block and configure the Azure storage client API to take advantage of the extensible retry functionality provided by the application block.

For more information about using the retry policies, see the topic "Key Scenarios" on MSDN.

For more information about the RetryPolicy delegate in the Microsoft.WindowsAzure.StorageClient namespace, see the blog post "Overview of Retry Policies in the Azure Storage Client Library."

Executing an Operation with a Retry Policy

The RetryPolicy class includes several overloaded versions of the ExecuteAction method. You use the ExecuteAction method to wrap the calls in your application that may be affected by transient faults. The different overloaded versions enable you to wrap the following types of calls to a service.

  • Synchronous calls that return a void.
  • Synchronous calls that return a value.
  • Asynchronous calls that return a void.
  • Asynchronous calls that return a value.

The ExecuteAction method automatically applies the configured retry strategy and detection strategy when it invokes the specified action. If no transient fault manifests itself during the invocation, your application continues as normal, as if there was nothing between your code and the action being invoked. If a transient fault does manifest itself, the block will initiate the recovery by attempting to invoke the specified action multiple times as defined in the retry strategy. As soon as a retry attempt succeeds, your application continues as normal. If the block does not succeed in executing the operation within the number of retries specified by the retry strategy, then the block rethrows the exception to your application. Your application must still handle this exception properly.

Note

The Transient Fault Handling Application Block is not a substitute for proper exception handling. Your application must still handle any exceptions that are thrown by the service you are using.

Hh680901.note(en-us,PandP.50).gifMarkus Says:
Markus You can use the Retrying event to receive notifications in your application about the retry operations that the block performs.

In addition, the application block includes classes that wrap many common SQL Azure operations with a retry policy for you. Using these classes minimizes the amount of code you need to write.

Hh680901.note(en-us,PandP.50).gifMarkus Says:
Markus If you are working with SQL Azure, the application block includes classes that provide direct support for SQL Azure, such as the ReliableSqlConnection class. These classes will help you reduce the amount of code you need to write.

For more information about executing an operation with a retry policy, see the topic "Key Scenarios" on MSDN.

When Should You Use the Transient Fault Handling Application Block?

This section describes two scenarios in which you should consider using the Transient Fault Handling Application Block in your Azure solution.

You are Using an Azure Service

If your application uses any of the Azure services supported by the Transient Fault Handling Application Block (SQL Azure, Azure Storage, Azure Caching, or Azure Service Bus), then you can make your application more robust by using the application block. Any Azure application that uses these services may occasionally encounter transient faults with these services. Although you could add your own detection logic to your application, the application block's built-in detection strategies will handle a wider range of transient faults. It is also quicker and easier to use the application block instead of developing your own solution.

The Azure storage client API already includes support for custom retry policies. You can use retry strategies from the application block with the Azure storage client API. Using retry strategies from the Transient Fault Handling Application Block with the Azure retry mechanism enables you to use the built-in and custom retry strategies and to support defining retry strategies in the application configuration.

Note

Using Transient Fault Handling Application Block retry polices instead of Azure built-in retry policies will enable you to take advantage of the customizable and extensible retry logic in the application block.

For more information about retries in Azure storage, see "Overview of Retry Policies in the Azure Storage Client Library."

You Are Using a Custom Service

If your application uses a custom service, it can still benefit from using the Transient Fault Handling Application Block. You can author a custom detection strategy for your service that encapsulates your knowledge of which transient exceptions may result from a service invocation. The Transient Fault Handling Application Block then provides you with the framework for defining retry policies and for wrapping your method calls so that the application block applies the retry logic.

More Information

For more examples of how you can use the Transient Fault Handling Application Block in your Azure application, see Chapter 7, "Making Tailspin Surveys More Resilient."

For detailed information about configuring the Transient Fault Handling Application Block and writing code that uses the Transient Fault Handling Application Block, see the topic "The Transient Fault Handling Application Block" on MSDN:
https://msdn.microsoft.com/en-us/library/hh680934(v=PandP.50).aspx

For more information about throttling in Azure, see "Azure Storage Abstractions and their Scalability Targets" on MSDN:
https://go.microsoft.com/fwlink/?LinkID=234633

For information about how you can use NuGet to prepare your Visual Studio project to work with the Transient Fault Handling Application Block, see the topic "Adding the Transient Fault Handling Application Block to your Solution" on MSDN:
https://msdn.microsoft.com/en-us/library/hh680891(v=PandP.50).aspx

There is an additional approach that is provided for backward compatibility with the "Transient Fault Handling Application Framework" that uses the RetryPolicyFactory class:
http://windowsazurecat.com/2011/02/transient-fault-handling-framework/

For more details about the parameters for each retry strategy, see the topic "Source Schema for the Transient Fault Handling Application Block" on MSDN:
https://msdn.microsoft.com/en-us/library/hh680941(v=PandP.50).aspx

You can define your own, custom retry strategy. For more information, see the topic "Implementing a Custom Retry Strategy" on MSDN:
https://msdn.microsoft.com/en-us/library/hh680943(v=PandP.50).aspx

For more information about how to define your retry strategy in code, see the topic "Specifying Retry Strategies in Code" on MSDN:
https://msdn.microsoft.com/en-us/library/hh680927(v=PandP.50).aspx

For more information about how to define your retry strategies in a configuration file, see the topic "Specifying Retry Strategies in the Configuration" on MSDN:
https://msdn.microsoft.com/en-us/library/hh680900(v=PandP.50).aspx

For more information about using the retry policies, see the topic "Key Scenarios" on MSDN:
https://msdn.microsoft.com/en-us/library/hh680948(v=PandP.50).aspx

For more information about the RetryPolicy delegate in the Microsoft.WindowsAzure.StorageClient namespace, see the blog post "Overview of Retry Policies in the Azure Storage Client Library":
https://go.microsoft.com/fwlink/?LinkID=234630

For more information about retries in Azure storage, see "Overview of Retry Policies in the Azure Storage Client Library":
https://go.microsoft.com/fwlink/?LinkID=234630

The Transient Fault Handling Application Block is a product of the collaboration between the Microsoft patterns & practices team (https://msdn.microsoft.com/practices) and the Azure Customer Advisory Team (http://windowsazurecat.com/index.php). It is based on the initial detection and retry strategies, and the data access support from the "Transient Fault Handling Framework for SQL Azure, Azure Storage, Service Bus & Cache" on MSDN:
http://windowsazurecat.com/2011/02/transient-fault-handling-framework/

Next Topic | Previous Topic | Home

Last built: June 7, 2012