SQL Connection - error 35

Tom wuyts 106 Reputation points
2021-03-27T07:27:16.91+00:00

Hi,

Since (at least) last monday (22/3), I've been getting the error below from a background task that's running in AKS on linux. When running the same task from visual studio (on Windows), this error does not occur.

The error occurs anywhere between 1 and 10 times per minute, across 30-50 pods running the task. If I reduce the scaling, it still happens.

The full error:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 35 - An internal exception was caught)  ---> System.ObjectDisposedException: Cannot access a disposed object. Object name: 'System.Net.Sockets.Socket'.    

at System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Int32 optionValue)   
at System.Net.Sockets.Socket.set_NoDelay(Boolean value)    
at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, Int64 timerExpire, Object callbackObject, Boolean parallel)    
at Calidos.Artemis.Services.RelinkerADTService.Relink(String hisPatientUid) in /src/Models/Calidos.Artemis.Services/MessageProcesses/RelinkerADTService.cs:line 84    
at Calidos.Artemis.QueueProcessor.RelinkADTAction.Run(RelinkMsg msg, CloudQueue theQueue) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/Actions/RelinkADTAction.cs:line 27    
at Calidos.Artemis.QueueProcessor.QueueWorker.ProcessMessage_Relinker(CloudQueue theQueue, Boolean msgOk, String[] msg, Object msgType) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/QueueWorker.cs:line 272 ClientConnectionId:de553492-0d5d-4937-957c-2a76dbb3e8ee Routing Destination:d31d9ada42e9.tr2384.westeurope1-a.worker.database.windows.net,11047   
at Calidos.Artemis.Services.RelinkerADTService.Relink(String hisPatientUid) in /src/Models/Calidos.Artemis.Services/MessageProcesses/RelinkerADTService.cs:line 84    
at Calidos.Artemis.QueueProcessor.RelinkADTAction.Run(RelinkMsg msg, CloudQueue theQueue) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/Actions/RelinkADTAction.cs:line 27    
at Calidos.Artemis.QueueProcessor.QueueWorker.ProcessMessage_Relinker(CloudQueue theQueue, Boolean msgOk, String[] msg, Object msgType) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/QueueWorker.cs:line 272

System.ObjectDisposedException: Cannot access a disposed object. Object name: 'System.Net.Sockets.Socket'.    
at System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Int32 optionValue)    
at System.Net.Sockets.Socket.set_NoDelay(Boolean value)    
at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, Int64 timerExpire, Object callbackObject, Boolean parallel)   
at System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Int32 optionValue)    
at System.Net.Sockets.Socket.set_NoDelay(Boolean value)    
at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, Int64 timerExpire, Object callbackObject, Boolean parallel) 

Occasionally, I also see this error 40:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server) 

According to https://github.com/dotnet/SqlClient/issues/449, I was asked to ask Azure support. They forwarded me here.

Thanks in advance,
Tom Wuyts

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,869 questions
0 comments No comments
{count} vote

Accepted answer
  1. Tom wuyts 106 Reputation points
    2021-04-08T21:15:42.307+00:00

    I've found the cause. The database-connection string used by our AKS-pods had Pooling set to "False", causing it to use a ton of connections (500-1000/minute over 30 pods). Setting it to "True" reduced the amount of connections to 30-40, and the error no longer appears!

    1 person found this answer helpful.
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. shiva patpi 13,141 Reputation points Microsoft Employee
    2021-03-27T20:20:04.927+00:00

    Hello @Tom wuyts ,
    Do you still see the issue as of now , if yes - kindly let us know.
    Can you SSH into the node on which the pod is running and validate the latest logs in /var/log/syslog (or syslog.1) , check if you are seeing the errors like "eth0: Lost carrier"

    If you would have seen those errors , you might be hitting the issue mentioned here https://github.com/Azure/aks-engine/issues/4341
    It should be fixed as of today , kindly let us know if you are still seeing the time out errors.

    1 person found this answer helpful.

  2. Muhammad hamid 11 Reputation points
    2022-04-27T05:49:11.877+00:00

    Hi , I am Facing the same error , if someone had solved please help me thanks,

    Regards , M.Asad