Troubleshoot Azure Cache for Redis server issues

Article
07/06/2022

This section discusses troubleshooting issues caused by conditions on an Azure Cache for Redis server or any of the virtual machines hosting it.

High server load
High memory usage
Long-running commands
Server-side bandwidth limitation

Note

Several of the troubleshooting steps in this guide include instructions to run Redis commands and monitor various performance metrics. For more information and instructions, see the articles in the Additional information section.

High server load

High server load means the Redis server is busy and unable to keep up with requests, leading to timeouts. Check the Server Load metric on your cache by selecting Monitoring from the Resource menu on the left. You see the Server Load graph in the working pane under Insights. Or, add a metric set to Server Load under Metrics.

Following are some options to consider for high server load.

Scale up or scale out

Scale out to add more shards, so that load is distributed across multiple Redis processes. Also, consider scaling up to a larger cache size with more CPU cores. For more information, see Azure Cache for Redis planning FAQs.

Rapid changes in number of client connections

For more information, see Avoid client connection spikes.

Long running or expensive commands

This section was moved. For more information, see Long running commands.

Scaling

Scaling operations are CPU and memory intensive as it could involve moving data around nodes and changing cluster topology. For more information, see Scaling.

Server maintenance

If your Azure Cache for Redis underwent a failover, all client connections from the node that went down are transferred to the node that is still running. The server load could spike because of the increased connections. You can try rebooting your client applications so that all the client connections get recreated and redistributed among the two nodes.

High memory usage

Memory pressure on the server can lead to various performance problems that delay processing of requests. When memory pressure hits, the system pages data to disk, which causes the system to slow down significantly.

Here are some possible causes of memory pressure:

The cache is filled with data near its maximum capacity
Redis server is seeing high memory fragmentation

Fragmentation is likely to be caused when a load pattern is storing data with high variation in size. For example, fragmentation might happen when data is spread across 1 KB and 1 MB in size. When a 1-KB key is deleted from existing memory, a 1-MB key can’t fit into it causing fragmentation. Similarly, if 1-MB key is deleted and 1.5-MB key is added, it can’t fit into the existing reclaimed memory. This causes unused free memory and results in more fragmentation.

If the used_memory_rss value is higher than 1.5 times the used_memory metric, there's fragmentation in memory. The fragmentation can cause issues when:

Memory usage is close to the max memory limit for the cache, or
UsedMemory_RSS is higher than the Max Memory limit, potentially resulting in page faulting in memory.

If a cache is fragmented and is running under high memory pressure, the system does a failover to try recovering Resident Set Size (RSS) memory.

Redis exposes two stats, used_memory and used_memory_rss, through the INFO command that can help you identify this issue. You can view these metrics using the portal.

Validate that the maxmemory-reserved and maxfragmentationmemory-reserved values are set appropriately.

There are several possible changes you can make to help keep memory usage healthy:

Configure a memory policy and set expiration times on your keys. This policy may not be sufficient if you have fragmentation.
Configure a maxmemory-reserved value that is large enough to compensate for memory fragmentation.
Create alerts on metrics like used memory to be notified early about potential impacts.
Scale to a larger cache size with more memory capacity. For more information, see Azure Cache for Redis planning FAQs.

For recommendations on memory management, see Best practices for memory management.

Troubleshoot Azure Cache for Redis server issues

High server load

Scale up or scale out

Rapid changes in number of client connections

Long running or expensive commands

Scaling

Server maintenance

High memory usage

Long-running commands

Server-side bandwidth limitation

Additional information

Feedback

Additional resources