Use Apache Ambari Views to debug Apache Tez Jobs on HDInsight
The Apache Ambari Web UI for HDInsight contains a Apache TEZ view that can be used to understand and debug jobs that use Tez. The Tez view allows you to visualize the job as a graph of connected items, drill into each item, and retrieve statistics and logging information.
The steps in this document require an HDInsight cluster that uses Linux. Linux is the only operating system used on HDInsight version 3.4 or greater. For more information, see HDInsight component versioning.
- A Linux-based HDInsight cluster. For steps on creating a cluster, see Get started using Linux-based HDInsight.
- A modern web browser that supports HTML5.
Understanding Apache Tez
Tez is an extensible framework for data processing in Apache Hadoop that provides greater speeds than traditional MapReduce processing. For Linux-based HDInsight clusters, it is the default engine for Hive.
Tez creates a Directed Acyclic Graph (DAG) that describes the order of actions required by jobs. Individual actions are called vertices, and execute a piece of the overall job. The actual execution of the work described by a vertex is called a task, and may be distributed across multiple nodes in the cluster.
Understanding the Tez view
The Tez view provides both historical information and information on processes that are running. This information shows how a job is distributed across clusters. It also displays counters used by tasks and vertices, and error information related to the job. It may offer useful information in the following scenarios:
- Monitoring long-running processes, viewing the progress of map and reduce tasks.
- Analyzing historical data for successful or failed processes to learn how processing could be improved or why it failed.
Generate a DAG
The Tez view only contains data if a job that uses the Tez engine is currently running, or has been ran previously. Simple Hive queries can be resolved without using Tez. More complex queries that do filtering, grouping, ordering, joins, etc. use the Tez engine.
Use the following steps to run a Hive query that uses Tez:
In a web browser, navigate to https://CLUSTERNAME.azurehdinsight.net, where CLUSTERNAME is the name of your HDInsight cluster.
From the menu at the top of the page, select the Views icon. This icon looks like a series of squares. In the dropdown that appears, select Hive view.
When the Hive view loads, paste the following query into the Query Editor, and then click execute.
select market, state, country from hivesampletable where deviceplatform='Android' group by market, country, state;
Once the job has completed, you should see the output displayed in the Query Process Results section. The results should be similar to the following text:
market state country en-GB Hessen Germany en-GB Kingston Jamaica
Select the Log tab. You see information similar to the following text:
INFO : Session is already open INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1454546500517_0063)
Save the App id value, as this value is used in the next section.
Use the Tez View
From the menu at the top of the page, select the Views icon. In the dropdown that appears, select Tez view.
When the Tez view loads, you see a list of hive queries that are currently running, or have been ran on the cluster.
If you have only one entry, it is for the query that you ran in the previous section. If you have multiple entries, you can search by using the fields at the top of the page.
Select the Query ID for a Hive query. Information about the query is displayed.
The tabs on this page allow you to view the following information:
- Query Details: Details about the Hive query.
- Timeline: Information about how long each stage of processing took.
Configurations: The configuration used for this query.
From Query Details you can use the links to find information about the Application or the DAG for this query.
The Application link displays information about the YARN application for this query. From here you can access the YARN application logs.
- The DAG link displays information about the directed acyclic graph for this query. From here you can view a graphical representation of the DAG. You can also find information on the vertices within the DAG.
Now that you have learned how to use the Apache Tez view, learn more about Using Apache Hive on HDInsight.
For more detailed technical information on Apache Tez, see the Apache Tez page at Hortonworks.
For more information on using Apache Ambari with HDInsight, see Manage HDInsight clusters using the Apache Ambari Web UI