Monitoring with Azure Managed Prometheus and Grafana

Important

This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.

Cluster and service Monitoring is integral part of any organization. Azure HDInsight on AKS comes with integrated monitoring experience with Azure services. In this article, we use managed Prometheus service with Azure Grafana dashboards for monitoring.

Azure Managed Prometheus is a service that monitors your cloud environments. The monitoring is to maintain their availability and performance and workload metrics. It collects data generated by resources in your Azure instances and from other monitoring tools. The data is used to provide analysis across multiple sources.

Azure Managed Grafana is a data visualization platform built on top of the Grafana software by Grafana Labs. It's built as a fully managed Azure service operated and supported by Microsoft. Grafana helps you bring together metrics, logs, and traces into a single user interface. With its extensive support for data sources and graphing capabilities, you can view and analyze your application and infrastructure telemetry data in real-time.

This article covers the details of enabling the monitoring feature in HDInsight on AKS.

Prerequisites

For the instructions on how to create an HDInsight on AKS cluster, see Get started with Azure HDInsight on AKS.

Enabling Azure Managed Prometheus and Grafana

The Azure Managed Prometheus and Grafana Monitoring must be configured at cluster pool level to enable it at cluster level. You need to consider various stages while enabling the Monitoring Solution.

# Scenario Enable Disable
1 Cluster Pool -During Creation Not Supported Default
2 Cluster Pool – Post Creation Supported Not Supported
3 Cluster – During Creation Supported Default
4 Cluster – Post Creation Supported Supported

During cluster pool creation

Currently, Managed Prometheus CANNOT be enabled during Cluster Pool creation time. You can configure it post cluster pool creation.

Post cluster pool creation

Monitoring can be enabled from the Integrations tab on an existing Cluster Pool View available in Azure portal. You can use pre created workspaces or create a new one while your'e configuring the monitoring for the cluster pool.

Use precreated workspace

  1. Click on configure to enable Azure Prometheus monitoring.

    Screenshot showing integration configure tab.

  2. Click on Advanced Settings to attach your pre created workspaces.

    Screenshot showing advanced settings.

    Screenshot showing configure Prometheus step 1.

Create Azure Prometheus and Grafana Workspace while enabling Monitoring in Cluster Pool

You can create the workspaces from the HDI on AKS cluster pool page.

  1. Click on Configure next to the Azure Prometheus option.

    Screenshot showing configure Prometheus step 2.

  2. Click on Create New workspace for Azure Managed Prometheus.

    Screenshot showing configure Prometheus step 3.

  3. Fill in the name, region and click on Create for Prometheus.

    Screenshot showing configure Prometheus step 4.

  4. Click on Create New workspace for Azure Managed Grafana.

  5. Fill in Name, Region and click on Create for Grafana.

    Screenshot showing configure Prometheus step 5.

    Note

    1. Managed Grafana can be enabled only if Managed Prometheus is enabled.
    2. Once Azure Managed Prometheus workspace and Azure Managed Grafana workspace is enabled from the HDInsight on AKS cluster pool, it cannot be disabled from the cluster pool again. It must be disabled from the cluster level.

During cluster creation

Enable Azure Managed Prometheus during cluster creation

  1. Once the cluster pool is created and the Azure Managed Prometheus enabled, user must create a HDI on AKS cluster in the same cluster pool.

  2. During the cluster creation process, navigate to the Integration page and enable Azure Prometheus.

    Screenshot showing enable prometheus monitoring.

Post cluster creation

You can also enable Azure Managed Prometheus post HDI on AKS cluster creation

  1. Navigate to the Integrations tab in the cluster page.

  2. Enable Azure Prometheus Monitoring with the toggle button and click on Save.

    Screenshot showing how to save configuration.

    Note

    Similarly, if you need to disable Azure Prometheus monitoring can be done by disabling the toggle button and click on Save.

Enabling required permissions

For viewing Azure Managed Prometheus and Azure Managed Grafana from the HDInsight on AKS portal, you need to have certain permissions as follows.

User permission: For viewing Azure Managed Grafana, “Grafana Viewer” role is required for the user in the Azure Managed Grafana workspace, Access control (IAM). View how to grant user access, here.

  1. Open the Grafana workspace configured in the cluster pool.

  2. Select the Role as Grafana Viewer

  3. Select the username who is accessing the Grafana dashboard.

  4. Select the user and click on Review+ Assign

    Note

    If user is pre-creating Azure Managed Prometheus the Grafana Identity requires additional permission of Monitoring Reader.

  5. In the Grafana workspace page (the one linked to the cluster) provides Monitoring reader permission in Identity tab.

    Screenshot showing how to assign role.

  6. Click on Add role assignment.

  7. Select the following parameters

    1. Scope as Subscription
    2. The subscription name.
    3. Role as Monitoring Reader

    Screenshot showing how to assign role.

    Note

    For viewing other roles for Grafana users see here.

View metrics

We are using an Apache Spark™ cluster as an example in this case, assuming few jobs are executed in the cluster, in order to have the metrics.

Review the following steps to use the Grafana sample templates:

  1. Download the sample template from here for the respective workloads (download the Apache Spark template in this case).

  2. Login to the Grafana Dashboard from your cluster.

    Screenshot showing how to set time frame.

  3. Once the Grafana Dashboard page is opened, click on New > Import

    Screenshot showing how to metric type.

  4. Click on the Upload Dashboard JSON file and upload the Apache Spark Grafana template that you have downloaded and click on Import.

    Screenshot showing how to run query.

  5. After the upload is complete, you can click on the dashboard to view the metrics.

    Screenshot showing how to view the output.

Reference