Azure Key Vault throttling guidance

Throttling is a process you initiate that limits the number of concurrent calls to the Azure service to prevent overuse of resources. Azure Key Vault (AKV) is designed to handle a high volume of requests. If an overwhelming number of requests occurs, throttling your client's requests helps maintain optimal performance and reliability of the AKV service.

Throttling limits vary based on the scenario. For example, if you are performing a large volume of writes, the possibility for throttling is higher than if you are only performing reads.

How does Key Vault handle its limits?

Service limits in Key Vault are there to prevent misuse of resources and ensure quality of service for all of Key Vault’s clients. When a service threshold is exceeded, Key Vault limits any further requests from that client for a period of time. When this happens, Key Vault returns HTTP status code 429 (Too many requests), and the requests fail. Also, failed requests that return a 429 count towards the throttle limits tracked by Key Vault.

If you have a valid business case for higher throttle limits, please contact us.

How to throttle your app in response to service limits

The following are best practices you should implement when your service is throttled:

  • Reduce the number of operations per request.
  • Reduce the frequency of requests.
  • Avoid immediate retries.
    • All requests accrue against your usage limits.

When you implement your app's error handling, use the HTTP error code 429 to detect the need for client-side throttling. If the request fails again with an HTTP 429 error code, you are still encountering an Azure service limit. Continue to use the recommended client-side throttling method, retrying the request until it succeeds.

Code that implements exponential backoff is shown below.

    public sealed class RetryWithExponentialBackoff
        private readonly int maxRetries, delayMilliseconds, maxDelayMilliseconds;

        public RetryWithExponentialBackoff(int maxRetries = 50,
            int delayMilliseconds = 200,
            int maxDelayMilliseconds = 2000)
            this.maxRetries = maxRetries;
            this.delayMilliseconds = delayMilliseconds;
            this.maxDelayMilliseconds = maxDelayMilliseconds;

        public async Task RunAsync(Func<Task> func)
            ExponentialBackoff backoff = new ExponentialBackoff(this.maxRetries,
                await func();
            catch (Exception ex) when (ex is TimeoutException ||
                ex is System.Net.Http.HttpRequestException)
                Debug.WriteLine("Exception raised is: " +
                    ex.GetType().ToString() +
                    " –Message: " + ex.Message +
                    " -- Inner Message: " +
                await backoff.Delay();
                goto retry;

    public struct ExponentialBackoff
        private readonly int m_maxRetries, m_delayMilliseconds, m_maxDelayMilliseconds;
        private int m_retries, m_pow;

        public ExponentialBackoff(int maxRetries, int delayMilliseconds,
            int maxDelayMilliseconds)
            m_maxRetries = maxRetries;
            m_delayMilliseconds = delayMilliseconds;
            m_maxDelayMilliseconds = maxDelayMilliseconds;
            m_retries = 0;
            m_pow = 1;

        public Task Delay()
            if (m_retries == m_maxRetries)
                throw new TimeoutException("Max retry attempts exceeded.");
            if (m_retries < 31)
                m_pow = m_pow << 1; // m_pow = Pow(2, m_retries - 1)
            int delay = Math.Min(m_delayMilliseconds * (m_pow - 1) / 2,
            return Task.Delay(delay);

Using this code in a client C# application is straightforward. The following example shows how, using the HttpClient class.

public async Task<Cart> GetCartItems(int page)
    _apiClient = new HttpClient();
    // Using HttpClient with Retry and Exponential Backoff
    var retry = new RetryWithExponentialBackoff();
    await retry.RunAsync(async () =>
        // work with HttpClient call
        dataString = await _apiClient.GetStringAsync(catalogUrl);
    return JsonConvert.DeserializeObject<Cart>(dataString);

Remember that this code is suitable only as a proof of concept.

On HTTP error code 429, begin throttling your client using an exponential backoff approach:

  1. Wait 1 second, retry request
  2. If still throttled wait 2 seconds, retry request
  3. If still throttled wait 4 seconds, retry request
  4. If still throttled wait 8 seconds, retry request
  5. If still throttled wait 16 seconds, retry request

At this point, you should not be getting HTTP 429 response codes.

See also

For a deeper orientation of throttling on the Microsoft Cloud, see Throttling Pattern.