Service limits in Azure AI Search

Artikel
04/04/2024

Maximum limits on storage, workloads, and quantities of indexes and other objects depend on whether you create Azure AI Search at Free, Basic, Standard, or Storage Optimized pricing tiers.

Free is a multitenant shared service that comes with your Azure subscription.
Basic provides dedicated computing resources for production workloads at a smaller scale, but shares some networking infrastructure with other tenants.
Standard runs on dedicated machines with more storage and processing capacity at every level. Standard comes in four levels: S1, S2, S3, and S3 HD. S3 High Density (S3 HD) is engineered for multi-tenancy and large quantities of small indexes (3,000 indexes per service). S3 HD doesn't provide the indexer feature and data ingestion must use APIs that push data from source to index.
Storage Optimized runs on dedicated machines with more total storage, storage bandwidth, and memory than Standard. This tier targets large, slow-changing indexes. Storage Optimized comes in two levels: L1 and L2.

Subscription limits

You can create multiple billable search services (Basic and higher), up to the maximum number of services allowed at each tier. For example, you could create up to 16 services at the Basic tier and another 16 services at the S1 tier within the same subscription. For more information about tiers, see Choose a tier (or SKU) for Azure AI Search.

Maximum service limits can be raised upon request. If you need more services within the same subscription, file a support request.

Resource	Free ¹	Basic	S1	S2	S3	S3 HD	L1	L2
Maximum services	1	16	16	8	6	6	6	6
Maximum search units (SU)²	N/A	3 SU	36 SU	36 SU	36 SU	36 SU	36 SU	36 SU

¹ You can have one free search service per Azure subscription. The free tier is based on infrastructure shared with other customers. Because the hardware isn't dedicated, scale-up isn't supported, and storage is limited to 50 MB.

² Search units (SU) are billing units, allocated as either a replica or a partition. You need both. To learn more about SU combinations, see Estimate and manage capacity of a search service.

Service limits

Search service limits for storage, partitions, and replicas vary by service creation date, with higher limits for newer services in supported regions.

A search service is subject to a maximum storage limit (partition size multiplied by the number of partitions) or by a hard limit on the maximum number of indexes or indexers, whichever comes first.

Service level agreements (SLAs) apply to billable services having two or more replicas for query workloads, or three or more replicas for query and indexing workloads. The number of partitions isn't an SLA consideration. For more information, see Reliability in Azure AI Search.

Free services don't have fixed partitions or replicas and they share resources with other subscribers.

Before April 3, 2024

Resource	Free	Basic	S1	S2	S3	S3 HD	L1	L2
Service level agreement (SLA)	No	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Storage (partition size)	50 MB	2 GB	25 GB	100 GB	200 GB	200 GB	1 TB	2 TB
Partitions	N/A	1	12	12	12	3	12	12
Replicas	N/A	3	12	12	12	12	12	12

After April 3, 2024

For new services created after April 3, 2024:

Basic tier can have up to three partitions and three replicas, and a total of nine search units (SU).
Basic, S1, S2, S3 have more storage per partition, ranging from 3-7 times more, depending on the tier.
Your new search service must be in a supported region to get the extra capacity for Basic and other tiers.

Currently, there's no in-place upgrade. You should create a new search service to benefit from the extra storage.

Resource	Free	Basic	S1	S2	S3	S3 HD	L1	L2
Service level agreement (SLA)	No	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Storage (partition size)	50 MB	15 GB	160 GB	350 GB	700 GB	700 GB	1 TB	2 TB
Partitions	N/A	3	12	12	12	3	12	12
Replicas	N/A	3	12	12	12	12	12	12

Supported regions with higher storage limits

Services created after April 3, 2024 must be in one of the following regions to get the extra storage. Watch for announcements in What's New in Azure AI Search for expansion to other regions.

Country	Regions providing extra capacity per partition
United States	East US, East US 2, Central US, North Central US, South Central US, West US, West US 2, West US 3, West Central US
United Kingdom	UK South, UK West
United Arab Emirates	UAE North
Switzerland	Switzerland West
Sweden	Sweden Central
Poland	Poland Central
Norway	Norway East
Korea	Korea Central, Korea South
Japan	Japan East, Japan West
Italy	Italy North
India	Central India, Jio India West
France	France Central
Europe	North Europe
Canada	Canada Central, Canada East
Bazil	Brazil South
Asia Pacific	East Asia, Southeast Asia
Australia	Australia East, Australia Southeast

Index limits

Resource	Free	Basic ¹	S1	S2	S3	S3 HD	L1	L2
Maximum indexes	3	5 or 15	50	200	200	1000 per partition or 3000 per service	10	10
Maximum simple fields per index ²	1000	100	1000	1000	1000	1000	1000	1000
Maximum dimensions per vector field	3072	3072	3072	3072	3072	3072	3072	3072
Maximum complex collections per index	40	40	40	40	40	40	40	40
Maximum elements across all complex collections per document ³	3000	3000	3000	3000	3000	3000	3000	3000
Maximum depth of complex fields	10	10	10	10	10	10	10	10
Maximum suggesters per index	1	1	1	1	1	1	1	1
Maximum scoring profiles per index	100	100	100	100	100	100	100	100
Maximum functions per profile	8	8	8	8	8	8	8	8
Maximum index size ⁴	N/A	N/A	N/A	1.92 TB	2.4 TB	100 GB	N/A	N/A

¹ Basic services created before December 2017 have lower limits (5 instead of 15) on indexes. Basic tier is the only tier with a lower limit of 100 fields per index.

² The upper limit on fields includes both first-level fields and nested subfields in a complex collection. For example, if an index contains 15 fields and has two complex collections with five subfields each, the field count of your index is 25. Indexes with a very large fields collection can be slow. Limit fields and attributes to just those you need, and run indexing and query test to ensure performance is acceptable.

³ An upper limit exists for elements because having a large number of them significantly increases the storage required for your index. An element of a complex collection is defined as a member of that collection. For example, assume a Hotel document with a Rooms complex collection, each room in the Rooms collection is considered an element. During indexing, the indexing engine can safely process a maximum of 3,000 elements across the document as a whole. This limit was introduced in api-version=2019-05-06 and applies to complex collections only, and not to string collections or to complex fields.

⁴ On most tiers, maximum index size is all available storage on your search service. For S2, S3, and S3 HD, the maximum size of any index is the number provided in the table. Applies to search services created after April 3, 2024.

You might find some variation in maximum limits if your service happens to be provisioned on a more powerful cluster. The limits here represent the common denominator. Indexes built to the above specifications are portable across equivalent service tiers in any region.

Document limits

You can have approximately 24 billion documents per index on Basic, S1, S2, S3, L1, and L2 search services. For S3 HD, the limit is 2 billion documents per index. Each instance of a complex collection counts as a separate document in terms of these limits.

Document size limits per API call

The maximum document size when calling an Index API is approximately 16 megabytes.

Document size is actually a limit on the size of the Index API request body. Since you can pass a batch of multiple documents to the Index API at once, the size limit realistically depends on how many documents are in the batch. For a batch with a single document, the maximum document size is 16 MB of JSON.

When estimating document size, remember to consider only those fields that can be consumed by a search service. Any binary or image data in source documents should be omitted from your calculations.

Vector index size limits

When you index documents with vector fields, Azure AI Search constructs internal vector indexes using the algorithm parameters you provide. The size of these vector indexes is restricted by the memory reserved for vector search for your service's tier (or SKU).

The service enforces a vector index size quota for every partition in your search service. Each extra partition increases the available vector index size quota. This quota is a hard limit to ensure your service remains healthy, which means that further indexing attempts once the limit is exceeded results in failure. You can resume indexing once you free up available quota by either deleting some vector documents or by scaling up in partitions.

The table describes the vector index size quota per partition across the service tiers. For context, it includes:

Partition storage limits for each tier, repeated here for context.
Amount of each partition (in GB) available for vector indexes (created when you add vector fields to an index).
Approximate number of embeddings (floating point values) per partition.

Use the GET Service Statistics to retrieve your vector index size quota or review the Indexes page or Usage tab in the Azure portal.

Vector limits vary by service creation date and tier. To check the age of your search service and learn more about vector indexes, see Vector index size and staying under limits.

Vector limits on services created after April 3, 2024 in supported regions

The highest vector limits are available on search services created after April 3, 2024 in a supported region.

Tier	Storage quota (GB)	Vector quota per partition (GB)	Approx. floats per partition (assuming 15% overhead)
Basic	15	5	1,100 million
S1	160	35	8,200 million
S2	350	100	23,500 million
S3	700	200	47,000 million
L1	1,000	12	2,800 million
L2	2,000	36	8,400 million

Notice that L1 and L2 limits are unchanged in the April 3 rollout.

Vector limits on services created between July 1, 2023 and April 3, 2024

The following limits applied to new services created between July 1 and April 3, 2024, except for the following regions, which have the original limits from before July 1, 2023:

Germany West Central
West India
Qatar Central

All other regions have these limits:

Tier	Storage quota (GB)	Vector quota per partition (GB)	Approx. floats per partition (assuming 15% overhead)
Basic	2	1	235 million
S1	25	3	700 million
S2	100	12	2,800 million
S3	200	36	8,400 million
L1	1,000	12	2,800 million
L2	2,000	36	8,400 million

Vector limits on services created before July 1, 2023

Tier	Storage quota (GB)	Vector quota per partition (GB)	Approx. floats per partition (assuming 15% overhead)
Basic	2	0.5	115 million
S1	25	1	235 million
S2	100	6	1,400 million
S3	200	12	2,800 million
L1	1,000	12	2,800 million
L2	2,000	36	8,400 million

Indexer limits

Maximum running times exist to provide balance and stability to the service as a whole, but larger data sets might need more indexing time than the maximum allows. If an indexing job can't complete within the maximum time allowed, try running it on a schedule. The scheduler keeps track of indexing status. If a scheduled indexing job is interrupted for any reason, the indexer can pick up where it last left off at the next scheduled run.

Resource	Free ¹	Basic ²	S1	S2	S3	S3 HD ³	L1	L2
Maximum indexers	3	5 or 15	50	200	200	N/A	10	10
Maximum datasources	3	5 or 15	50	200	200	N/A	10	10
Maximum skillsets ⁴	3	5 or 15	50	200	200	N/A	10	10
Maximum indexing load per invocation	10,000 documents	Limited only by maximum documents	Limited only by maximum documents	Limited only by maximum documents	Limited only by maximum documents	N/A	No limit	No limit
Minimum schedule	5 minutes	5 minutes	5 minutes	5 minutes	5 minutes	5 minutes	5 minutes	5 minutes
Maximum running time ⁵	1-3 minutes	2 or 24 hours	2 or 24 hours	2 or 24 hours	2 or 24 hours	N/A	2 or 24 hours	2 or 24 hours
Maximum running time for indexers with a skillset ⁶	3-10 minutes	2 hours	2 hours	2 hours	2 hours	N/A	2 hours	2 hours
Blob indexer: maximum blob size, MB	16	16	128	256	256	N/A	256	256
Blob indexer: maximum characters of content extracted from a blob	32,000	64,000	4 million	8 million	16 million	N/A	4 million	4 million

¹ Free services have indexer maximum execution time of 3 minutes for blob sources and 1 minute for all other data sources. Indexer invocation is once every 180 seconds. For AI indexing that calls into Azure AI services, free services are limited to 20 free transactions per indexer per day, where a transaction is defined as a document that successfully passes through the enrichment pipeline (tip: you can reset an indexer to reset its count).

² Basic services created before December 2017 have lower limits (5 instead of 15) on indexers, data sources, and skillsets.

³ S3 HD services don't include indexer support.

⁴ Maximum of 30 skills per skillset.

⁵ Regarding the 2 or 24 hour maximum duration for indexers: a 2-hour maximum is the most common and it's what you should plan for. The 24-hour limit is from an older indexer implementation. If you have unscheduled indexers that run continuously for 24 hours, it's because those indexers couldn't be migrated to the newer infrastructure. As a general rule, for indexing jobs that can't finish within two hours, put the indexer on a 2-hour schedule. When the first 2-hour interval is complete, the indexer picks up where it left off when starting the next 2-hour interval.

⁶ Skillset execution, and image analysis in particular, are computationally intensive and consume disproportionate amounts of available processing power. Running time for these workloads has been shortened to give other jobs in the queue more opportunity to run.

Note

As stated in the Index limits, indexers will also enforce the upper limit of 3000 elements across all complex collections per document starting with the latest GA API version that supports complex types (2019-05-06) onwards. This means that if you've created your indexer with a prior API version, you will not be subject to this limit. To preserve maximum compatibility, an indexer that was created with a prior API version and then updated with an API version 2019-05-06 or later, will still be excluded from the limits. Customers should be aware of the adverse impact of having very large complex collections (as stated previously) and we highly recommend creating any new indexers with the latest GA API version.

Shared private link resource limits

Indexers can access other Azure resources over private endpoints managed via the shared private link resource API. This section describes the limits associated with this capability.

Resource	Free	Basic	S1	S2	S3	S3 HD	L1	L2
Private endpoint indexer support	No	Yes	Yes	Yes	Yes	No	Yes	Yes
Private endpoint support for indexers with a skillset¹	No	No	No	Yes	Yes	No	Yes	Yes
Maximum private endpoints	N/A	10 or 30	100	400	400	N/A	20	20
Maximum distinct resource types²	N/A	4	7	15	15	N/A	4	4

¹ AI enrichment and image analysis are computationally intensive and consume disproportionate amounts of available processing power. For this reason, private connections are disabled on lower tiers to ensure the performance and stability of the search service itself.

² The number of distinct resource types are computed as the number of unique groupId values used across all shared private link resources for a given search service, irrespective of the status of the resource.

Synonym limits

Maximum number of synonym maps varies by tier. Each rule can have up to 20 expansions, where an expansion is an equivalent term. For example, given "cat", association with "kitty", "feline", and "felis" (the genus for cats) would count as 3 expansions.

Resource	Free	Basic	S1	S2	S3	S3-HD	L1	L2
Maximum synonym maps	3	3	5	10	20	20	10	10
Maximum number of rules per map	5000	20000	20000	20000	20000	20000	20000	20000

Index alias limits

Maximum number of index aliases varies by tier. In all tiers, the maximum number of aliases is double the maximum number of indexes allowed.

Resource	Free	Basic	S1	S2	S3	S3-HD	L1	L2
Maximum aliases	6	10 or 30	100	400	400	2000 per partition or 6000 per service	20	20

Data limits (AI enrichment)

An AI enrichment pipeline that makes calls to an Azure AI Language resource for entity recognition, entity linking, key phrase extraction, sentiment analysis, language detection, and personal-information detection is subject to data limits. The maximum size of a record should be 50,000 characters as measured by String.Length. If you need to break up your data before sending it to the sentiment analyzer, use the Text Split skill.

Throttling limits

API requests are throttled as the system approaches peak capacity. Throttling behaves differently for different APIs. Query APIs (Search/Suggest/Autocomplete) and indexing APIs throttle dynamically based on the load on the service. Index APIs and service operations API have static request rate limits.

Static rate request limits for operations related to an index:

List Indexes (GET /indexes): 3 per second per search unit
Get Index (GET /indexes/myindex): 10 per second per search unit
Create Index (POST /indexes): 12 per minute per search unit
Create or Update Index (PUT /indexes/myindex): 6 per second per search unit
Delete Index (DELETE /indexes/myindex): 12 per minute per search unit

Static rate request limits for operations related to a service:

Service Statistics (GET /servicestats): 4 per second per search unit

API request limits

Maximum of 16 MB per request ¹
Maximum 8-KB URL length
Maximum 1,000 documents per batch of index uploads, merges, or deletes
Maximum 32 fields in $orderby clause
Maximum 100,000 characters in a search clause
The maximum number of clauses in search (expressions separated by AND or OR) is 1024
Maximum search term size is 32,766 bytes (32 KB minus 2 bytes) of UTF-8 encoded text
Maximum search term size is 1,000 characters for prefix search and regex search
Wildcard search and Regular expression search are limited to a maximum of 1000 states when processed by Lucene.

¹ In Azure AI Search, the body of a request is subject to an upper limit of 16 MB, imposing a practical limit on the contents of individual fields or collections that aren't otherwise constrained by theoretical limits (see Supported data types for more information about field composition and restrictions).

Limits on query size and composition exist because unbounded queries can destabilize your search service. Typically, such queries are created programmatically. If your application generates search queries programmatically, we recommend designing it in such a way that it doesn't generate queries of unbounded size.

API response limits

Maximum 1,000 documents returned per page of search results
Maximum 100 suggestions returned per Suggest API request

API key limits

API keys are used for service authentication. There are two types. Admin keys are specified in the request header and grant full read-write access to the service. Query keys are read-only, specified on the URL, and typically distributed to client applications.

Maximum of 2 admin keys per service
Maximum of 50 query keys per service