Troubleshoot Azure Cache for Redis server-side issues

This section discusses troubleshooting issues that occur because of a condition on an Azure Cache for Redis or the virtual machine(s) hosting it.

Note

Several of the troubleshooting steps in this guide include instructions to run Redis commands and monitor various performance metrics. For more information and instructions, see the articles in the Additional information section.

Memory pressure on Redis server

Memory pressure on the server side leads to all kinds of performance problems that can delay processing of requests. When memory pressure hits, the system may page data to disk. This page faulting causes the system to slow down significantly. There are several possible causes of this memory pressure:

  • The cache is filled with data near its maximum capacity.
  • Redis is seeing high memory fragmentation. This fragmentation is most often caused by storing large objects since Redis is optimized for small objects.

Redis exposes two stats through the INFO command that can help you identify this issue: "used_memory" and "used_memory_rss". You can view these metrics using the portal.

There are several possible changes you can make to help keep memory usage healthy:

  • Configure a memory policy and set expiration times on your keys. This policy may not be sufficient if you have fragmentation.
  • Configure a maxmemory-reserved value that is large enough to compensate for memory fragmentation.
  • Break up your large cached objects into smaller related objects.
  • Create alerts on metrics like used memory to be notified early about potential impacts.
  • Scale to a larger cache size with more memory capacity.

High CPU usage or server load

A high server load or CPU usage means the server can't process requests in a timely fashion. The server may be slow to respond and unable to keep up with request rates.

Monitor metrics such as CPU or server load. Watch for spikes in CPU usage that correspond with timeouts.

There are several changes you can make to mitigate high server load:

  • Investigate what is causing CPU spikes such as long-running commands noted below or page faulting because of high memory pressure.
  • Create alerts on metrics like CPU or server load to be notified early about potential impacts.
  • Scale to a larger cache size with more CPU capacity.

Long-running commands

Some Redis commands are more expensive to execute than others. The Redis commands documentation shows the time complexity of each command. Because Redis command processing is single-threaded, a command that takes time to run will block all others that come after it. You should review the commands that you're issuing to your Redis server to understand their performance impacts. For instance, the KEYS command is often used without knowing that it's an O(N) operation. You can avoid KEYS by using SCAN to reduce CPU spikes.

Using the SLOWLOG command, you can measure expensive commands being executed against the server.

Server-side bandwidth limitation

Different cache sizes have different network bandwidth capacities. If the server exceeds the available bandwidth, then data won't be sent to the client as quickly. Clients requests could time out because the server can't push data to the client fast enough.

The "Cache Read" and "Cache Write" metrics can be used to see how much server-side bandwidth is being used. You can view these metrics in the portal.

To mitigate situations where network bandwidth usage is close to maximum capacity:

  • Change client call behavior to reduce network demand.
  • Create alerts on metrics like cache read or cache write to be notified early about potential impacts.
  • Scale to a larger cache size with more network bandwidth capacity.

Additional information