Azure Update Management fails for servers behind Log Analytics Gateway

Jean-Michel Jacques 1 Reputation point
2020-10-26T15:05:42.39+00:00

Hello,
Since a few days we can't use Update Management anymore on our on-premise servers that they are using a Log analytic Gateway to connect to Azure. The Update Agent readiness shows "Disconnected" for those servers and any update deployment fails with message that hybrid worker is not available.
When checking the Operations Manager event log on those servers, we can regularly find this error:

A module of type "Microsoft.EnterpriseManagement.HealthService.AzureAutomation.HybridAgent" reported an exception Microsoft.EnterpriseManagement.HealthService.ModuleException: Unable to Register Machine for Patch Management, Registration Failed with Exception System.InvalidOperationException: System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
   at System.Net.TlsStream.EndWrite(IAsyncResult asyncResult)
   at System.Net.ConnectStream.WriteHeadersCallback(IAsyncResult ar)
   --- End of inner exception stack trace ---
   at System.Net.HttpWebRequest.EndGetRequestStream(IAsyncResult asyncResult, TransportContext& context)
   at System.Net.Http.HttpClientHandler.GetRequestStreamCallback(IAsyncResult ar)
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at AgentService.HybridRegistration.PowerShell.WebClient.AgentServiceClient.<Put>d__30`2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at AgentService.HybridRegistration.PowerShell.WebClient.AgentServiceClient.<RegisterV2>d__18.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at AgentService.OmsHybridRegistration.PowerShell.Commandlets.OmsHybridRunbookWorker.<RegisterWithService>d__62.MoveNext()

   at AgentService.OmsHybridRegistration.PowerShell.Commandlets.OmsHybridRunbookWorker.ExtractErrorMessageAndThrow(AggregateException exception)
   at AgentService.OmsHybridRegistration.PowerShell.Commandlets.OmsHybridRunbookWorker.RegisterGroupInDatabase()
   at AgentService.OmsHybridRegistration.PowerShell.Commandlets.OmsHybridRunbookWorker.Register() AgentServiceURI: https://cbff4b08-5c3a-4d34-8edf-7e75c597ca19.agentsvc.azure-automation.net/accounts/cbff4b08-5c3a-4d34-8edf-7e75c597ca19 which was running as part of rule "Microsoft.IntelligencePacks.AzureAutomation.HybridAgent.Init" running for instance "" with id:"{42DF9A9B-7F94-FCF8-BD0E-8BA9462687B1}" in management group "AOI-cbff4b08-5c3a-4d34-8edf-7e75c597ca19".

When checking the OMS Gateway log on the server acting as gateway, we find this event occurring constantly (event ID 406):

2020-10-26 13:11:00 [10] ERROR TcpConnection - Server certificate chain does not include a trusted root certificate. Cert count in chain: 3. Root cert: CN=DigiCert Global Root G2, OU=www.digicert.com, O=DigiCert Inc, C=US

Client servers and gateway are in the same AD on-premise domain so no certificate is defined for communications between all Microsoft Monitoring agents and the Log Analytics gateway.
Client servers run Windows Server 2012/2012 R2/2016/2019, they all have the problem
We tried to enable TLS 1.2 as recommended in
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/azure-monitor/platform/agent-windows.md
on a few clients but this did not solve the problem
Communication and updates are OK for the few servers that directly connect to Azure without using the gateway (including the server acting as gateway itself).
This problem began 3 days ago after Update Management has been successfully used for a long time on those servers.
Can someone help me identify the problem? Thank you!

Azure Monitor
Azure Monitor
An Azure service that is used to collect, analyze, and act on telemetry data from Azure and on-premises environments.
2,815 questions
Azure Automation
Azure Automation
An Azure service that is used to automate, configure, and install updates across hybrid environments.
1,131 questions
{count} votes

1 answer

Sort by: Most helpful
  1. tbgangav-MSFT 10,386 Reputation points
    2020-10-29T17:02:28.273+00:00

    Hi @Jean-Michel Jacques , It was determined that a recent SSL Certificate update to Automation caused connection issues to Automation endpoints. In order to mitigate impact, rollback of the certificates is done on all regions.

    0 comments No comments