Apache Ambari heartbeat issues in Azure HDInsight

This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

Scenario: High CPU utilization


Ambari agent has high CPU utilization, which results in alerts from Ambari UI that for some nodes the Ambari agent heartbeat is lost. The heartbeat lost alert is usually transient.


Due to various ambari-agent bugs, in rare cases, your ambari-agent can have high (close to 100) percentage CPU utilization.


  1. Identify process ID (pid) of ambari-agent:

    ps -ef | grep ambari_agent
  2. Then run the following command to show CPU utilization:

    top -p <ambari-agent-pid>
  3. Restart ambari-agent to mitigate issue:

    service ambari-agent restart
  4. If restart does not work, kill the ambari-agent process and then start it up:

    kill -9 <ambari-agent-pid>
    service ambari-agent start

Scenario: Ambari agent not started


Ambari agent hasn't started which results in alerts from Ambari UI that for some nodes the Ambari agent heartbeat is lost.


The alerts are caused by the Ambari agent not running.


  1. Confirm status of ambari-agent:

    service ambari-agent status
  2. Confirm if failover controller services are running:

    ps -ef | grep failover

    If failover controller services aren't running, it's likely due to a problem prevent hdinsight-agent from starting failover controller. Check hdinsight-agent log from /var/log/hdinsight-agent/hdinsight-agent.out file.

Scenario: Heartbeat lost for Ambari


Ambari heartbeat agent was lost.


OMS logs are causing high CPU utilization.


