DReplay Message: “Active connections exceed 8192, connection 8409 is waiting.”

This message was an interesting dive into the DReplay, session boundary logic that I thought I would share. 

Internally DReplay maintains a progressive, session queue. This queue is limited to 8192 entries and populated in connection replay order based on the connect/disconnect boundaries. A background worker maintains the queue for the replay workers, adding new sessions and cleaning up completed sessions.

DReplay is designed to allow 8192 concurrent sessions to replay. During the capture, this means you must have 8192 or fewer entries in sys.dm_exec_sessions. Exceeding the limit can result in the message and the wait state.

If you actually have 8192+ sessions that require synchronization with the 8193rd, 8194th, … session(s) the replay can stall because the 8193rd, 8194th, … session won’t have a replay worker until one of the 8192, previous sessions completes. This does not mean the replay will be stuck forever. Actions such as query completion, session completion, query timeouts, kill scripts and other such options can be used to achieve forward progress.

Why Do I Get This Message With Only 100 Concurrent Sessions Doing Make/Break?

You can encounter the message with fewer than 8192, concurrent sessions. The logic is to prepare sessions to be executed (read ahead if you will.) If, as a group, the current 8192 sessions take longer to replay than it takes to prepare sessions, the background worker will reach the limit and sleep until a session slot is available.

Here is an example where the background worker reaches the 8192 limit and waits. 3 sessions complete replay activities and the background worker prepares 3 new sessions and again reaches the 8192 limit. The messages are showing forward progress and that the “read ahead” limit has been reached. The background worker limits the queue size and waits to avoid encountering potential memory, resource limitations.

2013-07-05 19:00:16:255 CRITICAL [Client Replay] Active connections exceed 8192, connection 8467 is waiting.
2013-07-05 19:00:16:255 INFORMATION [Client Replay] All events for spid=298 have been replayed
2013-07-05 19:00:16:255 INFORMATION [Client Replay] All events for spid=362 have been replayed
2013-07-05 19:00:16:255 INFORMATION [Client Replay] All events for spid=293 have been replayed
2013-07-05 19:00:16:255 CRITICAL [Client Replay] Active connections exceed 8192, connection 8470 is waiting

I would also point out that the ‘connection #### is waiting’ is a nice progress indicator. The DReplay log previously shows the number of dispatched connections. 2013-07-05 18:59:47:956 OPERATIONAL [Client Replay] 35212 events are dispatched in 8800 connections. From the messages above you can see DReplay has prepared sessions up to 8469 of the 8800 total to be replayed.

One reproduction of the message was the 100 concurrent make/break connections, each repeating 160 times for a total of 16000 total sessions. DReplay sees these as 16000 unique sessions and sequences them accordingly. In doing this DReplay will queue 8192 sessions wait for sessions to complete, add a few more and repeat the logic. The message in this case is simply showing you have more then 8192 connect/disconnect boundaries (unique sessions) and DReplay has reached the prepared depth limit.

Bob Dorr - Principal SQL Server Escalation Engineer