question

DmitriyBorovkov-8329 avatar image
0 Votes"
DmitriyBorovkov-8329 asked DmitriyBorovkov-8329 commented

Microsoft HPC Pack 2016 HPC Management Service does not start

Hi there!
On our HPC cluster headnode doesn't start HPC Management Service with exception from it's logs:

e,04/06/2021 21:23:57.157, SrcFile="HpcManagement" SrcFunc="" SrcLine="0" Pid="7316" Tid="2740" TS="0x01d72b2b272d1285" String1="[HpcManagement] Exception:.System.ArgumentException: macAddress.. at Microsoft.ComputeCluster.Management.MacIpPair..ctor(String macAddress, String[] ipAddresses).. at Microsoft.ComputeCluster.Management.MachineIdentifier.AddIPPair(String macAddress, String[] IpAddresses).. at Microsoft.ComputeCluster.Management.ClusterModel.ClusterNode.get_FullIdentifier().. at Microsoft.ComputeCluster.Management.HpcClusterManager.UpdateGpuNodesWithGroup(IdentifiableInstance cluster, ModelQuery query).. at Microsoft.ComputeCluster.Management.HpcClusterManager.PopulateComputeNodeList().. at Microsoft.ComputeCluster.Management.HpcClusterManager.<Initialize>d_25.MoveNext()"
e,04/06/2021 21:23:57.172, SrcFile="HpcManagement" SrcFunc="" SrcLine="0" Pid="7316" Tid="3432" TS="0x01d72b2b272f7514" String1="[HpcManagement] HPC Management service fails to start: System.AggregateException: One or more errors occurred. ---> System.ArgumentException: macAddress.. at Microsoft.ComputeCluster.Management.MacIpPair..ctor(String macAddress, String[] ipAddresses).. at Microsoft.ComputeCluster.Management.MachineIdentifier.AddIPPair(String macAddress, String[] IpAddresses).. at Microsoft.ComputeCluster.Management.ClusterModel.ClusterNode.get_FullIdentifier().. at Microsoft.ComputeCluster.Management.HpcClusterManager.UpdateGpuNodesWithGroup(IdentifiableInstance cluster, ModelQuery query).. at Microsoft.ComputeCluster.Management.HpcClusterManager.PopulateComputeNodeList().. at Microsoft.ComputeCluster.Management.HpcClusterManager.<Initialize>d
25.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementHeadNodeService.<StartService>d7.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementServiceBase.<Start>d4.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementHeadNodeNtService.<StartService>d4.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementServiceBase.<Start>d4.MoveNext().. --- End of inner exception stack trace ---.. at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions).. at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken).. at System.Threading.Tasks.Task.Wait().. at Microsoft.ComputeCluster.Management.ManagementWinService.OnStart(String[] args)..---> (Inner Exception #0) System.ArgumentException: macAddress.. at Microsoft.ComputeCluster.Management.MacIpPair..ctor(String macAddress, String[] ipAddresses).. at Microsoft.ComputeCluster.Management.MachineIdentifier.AddIPPair(String macAddress, String[] IpAddresses).. at Microsoft.ComputeCluster.Management.ClusterModel.ClusterNode.get_FullIdentifier().. at Microsoft.ComputeCluster.Management.HpcClusterManager.UpdateGpuNodesWithGroup(IdentifiableInstance cluster, ModelQuery query).. at Microsoft.ComputeCluster.Management.HpcClusterManager.PopulateComputeNodeList().. at Microsoft.ComputeCluster.Management.HpcClusterManager.<Initialize>d25.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementHeadNodeService.<StartService>d7.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task).. at Microsoft.ComputeCluster.Management.ManagementServiceBase.<Start>d_4.MoveNext()..--- End of stack trace from previous location where exception was thrown ---.. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw().. at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerN"

It start from one reboot of headnode with updates (kb5000803 if I correct remember).
Now I've remove that update, but it didn't resolve issue

azure-hpc-pack
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Dmitriy,

What's the HPC Pack version? (HPC Cluster Manager -> Help -> About).

Would restarting the service or the head node help?

Regards,
Yutong Sun

0 Votes 0 ·

@YutongSun-5052
Issue was in MAC of one node in cluster.
It’s solved.

0 Votes 0 ·

0 Answers