Multi-fold increase in throughput for big gateway query compiles in SQL Server

There are some exciting new changes in the database engine that dramatically improves query compilation scalability in SQL Server 2014 and SQL Server 2016. These changes are yet another reason to upgrade your SQL Server versions.

When a query first starts and has to be compiled in SQL Server, it needs to acquire memory for various phases of compilation (parsing, algebrization, optimization etc.). This memory does not include the memory that is required to run (execute) the query. When multiple queries start compiling, there is no limit to number of queries that can concurrently compile. However, as the memory requirement grows and reaches a threshold, the query needs to acquire a semaphore to pass through the gateway. As you can see in the picture below, the number of concurrent compiles allowed decreases as queries pass from one gateway to another. Think of this as an inverted funnel with a very small opening. The throughput (flow rate ) is very low as you move towards the opening (big gateway). If the number of concurrent compiles has reached the limit allowed in a specific gateway, any new queries wanting to enter the gateway will wait on 'RESOURCE_SEMAPHORE_QUERY_COMPILE' and this will be seen in various diagnostics such as sys.dm_os_waiting_tasks, DBCC MEMORYSTATUS etc.


The number of queries allowed to compile concurrently in each gateway is shown on the left side of the picture above. This can also be seen in the output of DBCC MEMORYSTATUS. This output is taken from a system that has 48 logical cores.

Small Gateway (default) Value
---------------------------------------- -----------
Configured Units 192 // = 4 times number of cores
Available Units 192
Acquires 0
Waiters 0
Threshold Factor 380000
Threshold 380000

(6 row(s) affected)

Medium Gateway (default) Value
---------------------------------------- -----------
Configured Units 48 // = Number of cores
Available Units 48
Acquires 0
Waiters 0
Threshold Factor 12
Threshold -1

(6 row(s) affected)

Big Gateway (default) Value
---------------------------------------- -----------
Configured Units 1 // = only 1 query
Available Units 1
Acquires 0
Waiters 0
Threshold Factor 8
Threshold -1

This model has been in place for several versions of SQL Server and works very well and allows SQL Server to scale very well for large number of concurrent compiles for most workloads. However, with modern hardware (with several cores and large amounts of RAM) and new workloads pushing SQL Server to new boundaries, we often see several queries waiting for 'RESOURCE_SEMAPHORE_QUERY_COMPILE' in the Big Gateway. This is because of the restriction of only one large query compilation in the Big Gateway even when there is large amounts of memory available.

Now for the exciting announcement. We realized the need to improve the scalability of SQL Server for such workloads and have released an update which will dynamically adjust the number of concurrent compiles allowed in the Big Gateway.

What this means is SQL Server will automatically determine the number of concurrent queries that will be allowed in the Big Gateway depending on the Target Memory. Meaning, SQL Server will allow one query per 25GB of memory up to 80% of target. The upper limit is only enforced by the medium gateway limit since all queries in big gateway would have first passed through medium gateway. For example, on a system that has 24 physical/48 logical cores and 64GB RAM, the new behavior allows for two large 25GB query compilations up to 48 concurrent compilations (as allowed by the medium gateway) if queries do not need 25GB for compilation. This is a dramatic improvement in number of large compiles compared to earlier versions.

Note that this dynamic behavior is evaluated at the resource pool level. Hence each resource pool will benefit from allowing multiple compilations in the big gateway.

As you can see from the revised picture below (depicting the new improvement), several more queries can compile concurrently in the big gateway. The funnel now has a much wider opening and hence the throughput (flow rate) is dramatically increased.


To take advantage of this new policy change, install the following update for SQL Server 2014 and turn on trace flag 6498.

Large query compilation waits on RESOURCE_SEMAPHORE_QUERY_COMPILE in SQL Server 2014

In SQL Server 2016, (starting with CTP 2.4), you will not require the trace flag as the dynamic policy for Big Gateway is on by default. In addition, we have introduced a new DMV sys.dm_exec_query_optimizer_memory_gateways that will give both configuration and run time information on all the compile gateways for that instance.

In the sample output shown below taken from a machine that has 24 cores (48 logical with hyper threading) and 128GB RAM, you can see that up to 5 concurrent compiles will be allowed in the Big gateway:

pool_id name max_count active_count waiter_count threshold_factor threshold is_active
----------- ----------------- ----------- ------------ ------------ -------------------- ---------- ---------
1 Small Gateway 192 0 0 380000 380000 1
1 Medium Gateway 48 0 0 12 -1 0
1 Big Gateway 5 0 0 8 -1 0
2 Small Gateway 192 0 0 380000 380000 1
2 Medium Gateway 48 0 0 12 -1 0
2 Big Gateway 5 0 0 8 -1 0


An extended event query_optimizer_memory_gateway has also been added and if enabled will fire whenever the query enters medium or big gateway with the following information in the event data.

name description
------------------------------- --------------------------------------------
medium_gateway_active_acquires count of active acquires for medium gateway
medium_gateway_count_waiters count of waiters for medium gateway
medium_gateway_threshold Threshold for medium gateway
big_gateway_active_acquires count of active acquires for big gateway
big_gateway_count_waiters count of waiters for big gateway
big_gateway_threshold Threshold for big gateway
gateway_number Gateway number. used in predicates.
pool_name Name of the resource pool
gateway_name Name of the gateway trying to enter


To summarize the query compilation heuristics, using an example system that has 24 physical/48 logical cores, 128 GB RAM:


Ajay Jagannathan ( @ajayMSFT )

Principal Program Manager