Troubleshoot Azure Cache for Redis client-side issues

This section discusses troubleshooting issues that occur because of a condition on the Redis client that your application uses.

Memory pressure on Redis client

Memory pressure on the client machine leads to all kinds of performance problems that can delay processing of responses from the cache. When memory pressure hits, the system may page data to disk. This page faulting causes the system to slow down significantly.

To detect memory pressure on the client:

  • Monitor memory usage on machine to make sure that it doesn't exceed available memory.
  • Monitor the client's Page Faults/Sec performance counter. During normal operation, most systems have some page faults. Spikes in page faults corresponding with request timeouts can indicate memory pressure.

High memory pressure on the client can be mitigated several ways:

  • Dig into your memory usage patterns to reduce memory consumption on the client.
  • Upgrade your client VM to a larger size with more memory.

Traffic burst

Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.

Monitor how your ThreadPool statistics change over time using an example ThreadPoolLogger. You can use TimeoutException messages from StackExchange.Redis like below to further investigate:

    System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
    IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)

In the preceding exception, there are several issues that are interesting:

  • Notice that in the IOCP section and the WORKER section you have a Busy value that is greater than the Min value. This difference means your ThreadPool settings need adjusting.
  • You can also see in: 64221. This value indicates that 64,211 bytes have been received at the client's kernel socket layer but haven't been read by the application. This difference typically means that your application (for example, StackExchange.Redis) isn't reading data from the network as quickly as the server is sending it to you.

You can configure your ThreadPool Settings to make sure that your thread pool scales up quickly under burst scenarios.

High client CPU usage

High client CPU usage indicates the system can't keep up with the work it's been asked to do. Even though the cache sent the response quickly, the client may fail to process the response in a timely fashion.

Monitor the client's system-wide CPU usage using metrics available in the Azure portal or through performance counters on the machine. Be careful not to monitor process CPU because a single process can have low CPU usage but the system-wide CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. High CPU may also cause high in: XXX values in TimeoutException error messages as described in the Traffic burst section.

Note

StackExchange.Redis 1.1.603 and later includes the local-cpu metric in TimeoutException error messages. Ensure you using the latest version of the StackExchange.Redis NuGet package. There are bugs constantly being fixed in the code to make it more robust to timeouts so having the latest version is important.

To mitigate a client's high CPU usage:

  • Investigate what is causing CPU spikes.
  • Upgrade your client to a larger VM size with more CPU capacity.

Client-side bandwidth limitation

Depending on the architecture of client machines, they may have limitations on how much network bandwidth they have available. If the client exceeds the available bandwidth by overloading network capacity, then data isn't processed on the client side as quickly as the server is sending it. This situation can lead to timeouts.

Monitor how your Bandwidth usage change over time using an example BandwidthLogger. This code may not run successfully in some environments with restricted permissions (like Azure web sites).

To mitigate, reduce network bandwidth consumption or increase the client VM size to one with more network capacity.

High client connections

Client connections reaching the maximum for the cache can cause failures in client requests for connections beyond the maximum, and can also cause high server CPU usage on the cache due to processing repeated reconnection attempts.

High client connections may indicate a connection leak in client code. Connections may not be getting re-used or closed properly. Review client code for connection use.

If the high connections are all legitimate and required client connections, upgrading your cache to a size with a higher connection limit may be required.

Additional information