Azure AppFabric Caching - ErrorCode:SubStatus : What to do?

ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.)

You might encounter this error at some point during developing/deploying/testing your application which uses AppFabric Caching. Sadly, this is very generic and doesn't tell you a lot about what went wrong and who the culprit is. Fortunately, :-) we have a way to figure out the inner exception which might give us a some clue as to what happened. Before intializing the DataCacheFactory modify the DataCacheFactoryConfiguration.ChannelOpenTimeout of the dataCacheFactoryConfiguration instance that you are using to a larger value of 2 minutes. (can't use configuration to set the value, since 20 seconds in the allowed limit) . Also, this recommendation is only for debugging purposes and ideally should not be needed in production environments.

This lets the underlying layer to respond back with the actual error that occurred. Now, if you check the inner exception, you should see what caused the request to fail.

This is also valid for Connection Terminated error: Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0028>:SubStatus<ES0001>:The connection was terminated possibly due to server or network problems. Result of the request is unknown.

If the inner exception when you do a Get() or a Put() (not valid for GetDefaultCache()) says "System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host" this would actually mean that your connections are idling out. Basically in azure, if a connection remains idle for more than a minute, it is dropped. Sometimes, the notification for dropped connection is not received by the channel layer and next time when a cache request comes in, it will fail since the connection is stale. We could avoid this if DataCacheFactoryConfiguration.TransportProperties.ReceiveTimeout is set to a value less than 1 minute (say 45 seconds). This will proactively refresh the connection before it gets closed by Azure. This recommendation can be used in production environment in case the cache usage is intermittent/sporadic.(with pauses of more than 1 minute or so.)



 

PS:

1. For this to work ensure that the dll version for Microsoft.ApplicationServer.Caching.Core.dll and Microsoft.ApplicationServer.Caching.Client.dll is at least 1.0.4009.0

2. This is specific to Azure AppFabric Cache only.