A special GNE (General Network Error) messages when running SQL Server after installing service pack 1 for Windows Server 2003 and TCP registry key SynAttackProtect

The client error messages related to this post are the followings.

For MDAC client,

[DBNETLIB]General network error. Check your network documentation.

[DBNETLIB]ConnectionRead (recv()). For SNAC client,

For SNAC client,

TCP Provider: An existing connection was forcibly closed by the remote host”.

We found this issue after SQL Server 2005 Beta2 when w2k3sp1 (Windows 2003 server service pack 1) was one of our default OS for test. Lately, quite a few customers have reported seeing this issue during connection scalability test of SQL Server 2005 or on SQL Server 2000 production server, after upgrade base OS to w2k3sp1. This issue results from a subtle winsock behavior change caused by that w2k3sp1 server has enabled SynAttackProtect by default. This key is configurable through “HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesTcpipParametersSynAttackProtect{DWORD}”. As documented in SQL Server 2005 Book-On-Line, one way to resolve this issue is to disable it by set SynAttackProtect value to 0. Note that, event though we document this solution in SQL Server 2005 BOL, it applies to both SQL Server 2005 and SQL Server 2000. In this post, I want to explain what is really happening under the cover and help you to understand the risk to turn this key off.

What is the Sync Flood Attack?

Sync-Flood-Attack is a network Deny-Of-Service attack. It takes advantage of TCP connection handshake protocol by sending large amount of TCP Sync packets to a victim server with spoofed IP source address. In standard TCP implementation, once a new sync packet is received, a half opened connection handle and its associated context is allocated and queued in corresponding TCP listening socket backlog queue. With combination of high frequency of sync packet arrivals and the default timeout value of half open connection entries, the backlog queue, max to 200 by default on w2k3 standard server, can easily be exhausted; as result, sync flood attack can cause server become non-responsive to normal connection requests. Here is some more info, http://www.microsoft.com/technet/community/columns/cableguy/cg1204.mspx.

A solution to Sync Flood Attack

To mitigate such issue, a solution is to hold the half opened connection handle in a cache and only allocate full TCP connection context and en-queue the handle when a connection is fully established after TCP three way handshakes. The cache implemented in Windows is based on a hash table that can virtually hold unlimited half-opened connections. The cache also uses smaller timeout value for connection entries that stay in the cache so that unresponsive half open connection entry will be evicted faster.

The side effect to the normal connections

For normal connections, without SynAttackProtect, as soon as the backlog queue is full, server side TCP will send TCP reset in response to new TCP sync packet. The client side winsock connect() call that send first TCP sync packet to the serverl will receive the TCP reset and fail with ERROR_CONNECTION_REFUSED. When SynAttackProtect is on, even the backlog queue is full, a new connection request arrived at the server will still be accepted because the half connection cache is visually unlimited. However, once the handshake succeeds, the connection can still be rejected because the server TCP find out that the backlog queue is actually full when moves its entry from the cache to the backlog queue. With the timing shifted to send TCP reset from during the connection establishment handshakes to after, the client succeeds when calling winsock connect() but fails with WSAECONNRESET when performing the first recv() or send() on established connections.

What is the impact to SQL Server Client Driver?

SQL Server client drivers, including MDAC and SNAC, have retry logic for failure when connect() fails, a logic to improve user experience when there is transient network failure. However, they do not retry when first recv() or send() fail because of a TDS protocol compatibility issue with older versions of SQL server such as 7.0. With SyncAttackProtect enabled, due to the timing shifted, more first recv() or send() are failing inside the SQL Server client driver; and thus, more GNEs are raised to applications.

Solutions to this GNE issue.

As mentioned above, server can disable the SynAttackProtect in w2k3sp1. But keep in mind this solution makes the server subject to sync flood attack if it is open to public access. Another solution is to add a retry-logic to the SQL server client application to alleviate the problem. I am glad some applications are already doing that. Database application developers also need to keep in mind that the retry logic should be designed in such a way not to overwhelm the server.

 

Do you know that you can post question w.r.t SQL Server data access, connectivty issues at http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=87&SiteID=1 ?

Nan Tu

Software Design Engineer, SQL Server Protocols

Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights