Azure OpenAI Service quotas and limits

Article
03/19/2024

This article contains a quick reference and a detailed description of the quotas and limits for Azure OpenAI in Azure AI services.

Quotas and limits reference

The following sections provide you with a quick guide to the default quotas and limits that apply to Azure OpenAI:

Limit Name	Limit Value
OpenAI resources per region per Azure subscription	30
Default DALL-E 2 quota limits	2 concurrent requests
Default DALL-E 3 quota limits	2 capacity units (6 requests per minute)
Maximum prompt tokens per request	Varies per model. For more information, see Azure OpenAI Service models
Max fine-tuned model deployments	5
Total number of training jobs per resource	100
Max simultaneous running training jobs per resource	1
Max training jobs queued	20
Max Files per resource (fine-tuning)	50
Total size of all files per resource (fine-tuning)	1 GB
Max training job time (job will fail if exceeded)	720 hours
Max training job size (tokens in training file) x (# of epochs)	2 Billion
Max size of all files per upload (Azure OpenAI on your data)	16 MB
Max number or inputs in array with `/embeddings`	2048
Max number of `/chat/completions` messages	2048
Max number of `/chat/completions` functions	128
Max number of `/chat completions` tools	128
Maximum number of Provisioned throughput units per deployment	100,000
Max files per Assistant/thread	20
Max file size for Assistants & fine-tuning	512 MB
Assistants token limit	2,000,000 token limit

Regional quota limits

The default quota for models varies by model and region. Default quota limits are subject to change.

Quota for standard deployments is described in of terms of Tokens-Per-Minute (TPM).

Region	GPT-4	GPT-4-32K	GPT-4-Turbo	GPT-4-Turbo-V	GPT-35-Turbo	GPT-35-Turbo-Instruct	Text-Embedding-Ada-002	text-embedding-3-small	text-embedding-3-large	Babbage-002	Babbage-002 - finetune	Davinci-002	Davinci-002 - finetune	GPT-35-Turbo - finetune	GPT-35-Turbo-1106 - finetune	GPT-35-Turbo-0125 - finetune
australiaeast	40 K	80 K	80 K	30 K	300 K	-	350 K	-	-	-	-	-	-	-	-	-
brazilsouth	-	-	-	-	-	-	350 K	-	-	-	-	-	-	-	-	-
canadaeast	40 K	80 K	80 K	-	300 K	-	350 K	350 K	350 K	-	-	-	-	-	-	-
eastus	-	-	80 K	-	240 K	240 K	240 K	350 K	350 K	-	-	-	-	-	-	-
eastus2	-	-	80 K	-	300 K	-	350 K	350 K	350 K	-	-	-	-	250 K	250 K	250 K
francecentral	20 K	60 K	80 K	-	240 K	-	240 K	-	-	-	-	-	-	-	-	-
japaneast	-	-	-	30 K	300 K	-	350 K	-	-	-	-	-	-	-	-	-
northcentralus	-	-	80 K	-	300 K	-	350 K	-	-	240 K	250 K	240 K	250 K	250 K	250 K	250 K
norwayeast	-	-	150 K	-	-	-	350 K	-	-	-	-	-	-	-	-	-
southafricanorth	-	-	-	-	-	-	350 K	-	-	-	-	-	-	-	-	-
southcentralus	-	-	80 K	-	240 K	-	240 K	-	-	-	-	-	-	-	-	-
southindia	-	-	150 K	-	300 K	-	350 K	-	-	-	-	-	-	-	-	-
swedencentral	40 K	80 K	150 K	30 K	300 K	240 K	350 K	-	-	240 K	250 K	240 K	250 K	250 K	250 K	250 K
switzerlandnorth	40 K	80 K	-	30 K	300 K	-	350 K	-	-	-	-	-	-	-	-	-
switzerlandwest	-	-	-	-	-	-	-	-	-	-	250 K	-	250 K	250 K	250 K	250 K
uksouth	-	-	80 K	-	240 K	-	350 K	-	-	-	-	-	-	-	-	-
westeurope	-	-	-	-	240 K	-	240 K	-	-	-	-	-	-	-	-	-
westus	-	-	80 K	30 K	300 K	-	350 K	-	-	-	-	-	-	-	-	-
westus3	-	-	80 K	-	-	-	350 K	-	-	-	-	-	-	-	-	-

1 K = 1000 Tokens-Per-Minute (TPM). The relationship between TPM and Requests Per Minute (RPM) is currently defined as 6 RPM per 1000 TPM.

General best practices to remain within rate limits

To minimize issues related to rate limits, it's a good idea to use the following techniques:

Implement retry logic in your application.
Avoid sharp changes in the workload. Increase the workload gradually.
Test different load increase patterns.
Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.

How to request increases to the default quotas and limits

Quota increase requests can be submitted from the Quotas page of Azure OpenAI Studio. Please note that due to overwhelming demand, quota increase requests are being accepted and will be filled in the order they are received. Priority will be given to customers who generate traffic that consumes the existing quota allocation, and your request may be denied if this condition isn't met.

For other rate limits, please submit a service request.

Next steps

Explore how to manage quota for your Azure OpenAI deployments. Learn more about the underlying models that power Azure OpenAI.