SQL Connection - error 35

Question

Hi,

Since (at least) last monday (22/3), I've been getting the error below from a background task that's running in AKS on linux. When running the same task from visual studio (on Windows), this error does not occur.

The error occurs anywhere between 1 and 10 times per minute, across 30-50 pods running the task. If I reduce the scaling, it still happens.

The full error:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 35 - An internal exception was caught)  ---> System.ObjectDisposedException: Cannot access a disposed object. Object name: 'System.Net.Sockets.Socket'.    

at System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Int32 optionValue)   
at System.Net.Sockets.Socket.set_NoDelay(Boolean value)    
at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, Int64 timerExpire, Object callbackObject, Boolean parallel)    
at Calidos.Artemis.Services.RelinkerADTService.Relink(String hisPatientUid) in /src/Models/Calidos.Artemis.Services/MessageProcesses/RelinkerADTService.cs:line 84    
at Calidos.Artemis.QueueProcessor.RelinkADTAction.Run(RelinkMsg msg, CloudQueue theQueue) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/Actions/RelinkADTAction.cs:line 27    
at Calidos.Artemis.QueueProcessor.QueueWorker.ProcessMessage_Relinker(CloudQueue theQueue, Boolean msgOk, String[] msg, Object msgType) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/QueueWorker.cs:line 272 ClientConnectionId:de553492-0d5d-4937-957c-2a76dbb3e8ee Routing Destination:d31d9ada42e9.tr2384.westeurope1-a.worker.database.windows.net,11047   
at Calidos.Artemis.Services.RelinkerADTService.Relink(String hisPatientUid) in /src/Models/Calidos.Artemis.Services/MessageProcesses/RelinkerADTService.cs:line 84    
at Calidos.Artemis.QueueProcessor.RelinkADTAction.Run(RelinkMsg msg, CloudQueue theQueue) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/Actions/RelinkADTAction.cs:line 27    
at Calidos.Artemis.QueueProcessor.QueueWorker.ProcessMessage_Relinker(CloudQueue theQueue, Boolean msgOk, String[] msg, Object msgType) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/QueueWorker.cs:line 272

System.ObjectDisposedException: Cannot access a disposed object. Object name: 'System.Net.Sockets.Socket'.    
at System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Int32 optionValue)    
at System.Net.Sockets.Socket.set_NoDelay(Boolean value)    
at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, Int64 timerExpire, Object callbackObject, Boolean parallel)   
at System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Int32 optionValue)    
at System.Net.Sockets.Socket.set_NoDelay(Boolean value)    
at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, Int64 timerExpire, Object callbackObject, Boolean parallel)

Occasionally, I also see this error 40:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server)

According to https://github.com/dotnet/SqlClient/issues/449, I was asked to ask Azure support. They forwarded me here.

Thanks in advance,
Tom Wuyts

Accepted Answer

I've found the cause. The database-connection string used by our AKS-pods had Pooling set to "False", causing it to use a ton of connections (500-1000/minute over 30 pods). Setting it to "True" reduced the amount of connections to 30-40, and the error no longer appears!

Answer

Hello @Tom wuyts ,
Do you still see the issue as of now , if yes - kindly let us know.
Can you SSH into the node on which the pod is running and validate the latest logs in /var/log/syslog (or syslog.1) , check if you are seeing the errors like "eth0: Lost carrier"

If you would have seen those errors , you might be hitting the issue mentioned here https://github.com/Azure/aks-engine/issues/4341
It should be fixed as of today , kindly let us know if you are still seeing the time out errors.

Answer

Hi , I am Facing the same error , if someone had solved please help me thanks,

Regards , M.Asad

SQL Connection - error 35

2 additional answers