Best practices for monitoring virtual machines in Azure Monitor

This article provides architectural best practices for monitoring virtual machines and their client workloads using Azure Monitor. The guidance is based on the five pillars of architecture excellence described in Azure Well-Architected Framework.

Reliability

In the cloud, we acknowledge that failures happen. Instead of trying to prevent failures altogether, the goal is to minimize the effects of a single failing component. Use the following information to monitor your virtual machines and their client workloads for failure.

Design checklist

  • Create availability alert rules for Azure VMs.
  • Create agent heartbeat alert rule to verify agent health.
  • Configure data collection and alerting for monitoring reliability of client workflows.

Configuration recommendations

Recommendation Description
Create availability alert rules for Azure VMs. Use the availability metric (preview) to track when an Azure VM is running. While you can quickly enable an availability alert rule for an individual machine using recommended alerts, a single alert rule targeting a resource group or subscription enables availability alerting for all VMs in that scope for a particular region. This is easier to manage than creating an alert rule for each VM and ensures that any new VMs created in the scope are automatically monitored. This alert rule doesn't require the Azure Monitor agent to be installed on the VM, but it isn't available for VMs outside of Azure.
Create agent heartbeat alert rule to verify agent health. The Azure Monitor agent sends a heartbeat to the Log Analytics workspace every minute. Use a log search alert rule using the agent heartbeat to be alerted when an agent stops sending heartbeats, which is an indicator that either the VM is down or the agent is unhealthy and client workloads aren't being monitored. This alert rule requires that the Azure Monitor agent is installed on the VM and applies to both Azure and non-Azure VMs.
Configure data collection and alerting for monitoring reliability of client workflows. Use the information at Monitor virtual machines with Monitor virtual machines with Azure Monitor: Collect data to configure client event collection indicating potential issues with your client workloads. Use the information at Monitor virtual machines with Monitor virtual machines with Azure Monitor: Alerts to create alert rules to be proactively notified of any potential operational issues with your client workloads.

Security

Security is one of the most important aspects of any architecture. Azure Monitor provides features to employ both the principle of least privilege and defense-in-depth. Use the following information to monitor the security of your virtual machines.

Design checklist

  • Use other services for security monitoring of your VMs.
  • Consider using Azure private link for VMs to connect to Azure Monitor using a private endpoint.

Configuration recommendations

Recommendation Description
Use other services for security monitoring of your VMs. While Azure Monitor can collect security events from your VMs, it isn't intended to be used for security monitoring. Azure includes multiple services such as Microsoft Defender for Cloud and Microsoft Sentinel that together provide a complete security monitoring solution. See Security monitoring for a comparison of these services.
Consider using Azure private link for VMs to connect to Azure Monitor using a private endpoint. Connections to public endpoints are secured with end-to-end encryption. If you require a private endpoint, you can use Azure private link to allow your VMs to connect to Azure Monitor through authorized private networks. Private link can also be used to force workspace data ingestion through ExpressRoute or a VPN. See Design your Azure Private Link setup to determine the best network and DNS topology for your environment.

Cost optimization

Cost optimization refers to ways to reduce unnecessary expenses and improve operational efficiencies. You can significantly reduce your cost for Azure Monitor by understanding your different configuration options and opportunities to reduce the amount of data that it collects. See Azure Monitor cost and usage to understand the different ways that Azure Monitor charges and how to view your monthly bill.

Note

See Optimize costs in Azure Monitor for cost optimization recommendations across all features of Azure Monitor.

Design checklist

  • Migrate from Log Analytics agent to Azure Monitor agent for granular data filtering.
  • Filter data that you don't require from agents.
  • Determine whether you'll use VM insights and what data to collect.
  • Reduce polling frequency of performance counters.
  • Ensure that VMs aren't sending duplicate data.
  • Use Log Analytics workspace insights to analyze billable costs and identify cost saving opportunities.
  • Migrate your SCOM environment to Azure Monitor SCOM Managed Instance.

Configuration recommendations

Recommendation Description
Migrate from Log Analytics agent to Azure Monitor agent for granular data filtering. If you still have VMs with the Log Analytics agent, migrate them to Azure Monitor agent so you can take advantage of better data filtering and use unique configurations with different sets of VMs. Configuration for data collection by the Log Analytics agent is done on the workspace, so all agents receive the same configuration. Data collection rules used by Azure Monitor agent can be tuned to the specific monitoring requirements of different sets of VMs. The Azure Monitor agent also allows you to use transformations to filter data being collected.
Filter data that you don't require from agents. Reduce your data ingestion costs by filtering data that you don't use for alerting or analysis. See Monitor virtual machines with Azure Monitor: Collect data for guidance on data to collect for different monitoring scenarios and Control costs for specific guidance on filtering data to reduce your costs.
Determine what data to collect with VM insights. VM insights is a great feature to quickly get started with monitoring your VMs and provides powerful features such as Map and performance trend views. If you don't use the Map feature or the data that it collects, then you should disable collection of processes and dependency data in your VM insights configuration to save on data ingestion costs.
Reduce polling frequency of performance counters. If you're using a data collection rule to send performance data to your Log Analytics workspace, you can reduce their polling frequency to reduce the amount of data collected.
Ensure that VMs aren't sending duplicate data. If you multi-home agents or you create similar data collection rules, make sure you're sending unique data to each workspace. See Analyze usage in Log Analytics workspace for guidance on analyzing your collected data to make sure you aren't collecting duplicate data. If you're migrating between agents, continue to use the Log Analytics agent until you migrate to the Azure Monitor agent rather than using both together unless you can ensure that each is collecting unique data.
Use Log Analytics workspace insights to analyze billable costs and identify cost saving opportunities. Log Analytics workspace insights shows you the billable data collected in each table and from each VM. Use this information to identify your top machines and tables since they represent your best opportunity to reduce costs by filtering data. Use this insight and log queries in Analyze usage in Log Analytics workspace to further analyze the effects of configuration changes.
Migrate your SCOM environment to Azure Monitor SCOM Managed Instance. Migrate your existing SCOM environment to Azure Monitor SCOM Managed Instance to support any management packs that can't be replaced by Azure Monitor. SCOM managed instance removes the requirement to maintain local management servers and database servers, reducing your overall cost to maintain your SCOM infrastructure.

Operational excellence

Operational excellence refers to operations processes required keep a service running reliably in production. Use the following information to minimize the operational requirements for monitoring of your virtual machines.

Design checklist

  • Migrate from legacy agents to Azure Monitor agent.
  • Use Azure Arc to monitor your VMs outside of Azure.
  • Use Azure Policy to deploy agents and assign data collection rules.
  • Establish a strategy for structure of data collection rules.
  • Consider migrating SCOM client management packs to Azure Monitor.

Configuration recommendations

Recommendation Description
Migrate from legacy agents to Azure Monitor agent. The Azure Monitor agent is simpler to manage than the legacy Log Analytics agent and allows more flexibility in your Log Analytics workspace design. Both the Windows and Linux agents allow multihoming, which means they can connect to multiple workspaces. Data collection rules allow you to manage your data collection settings at scale and define unique, scoped configurations for subsets of machines. See Migrate to Azure Monitor Agent from Log Analytics agent for considerations and migration methods.
Use Azure Arc to monitor your VMs outside of Azure. Azure Arc for servers allows you to manage physical servers and virtual machines hosted outside of Azure, on your corporate network, or other cloud provider. With the Azure Connected machine agent in place, you can deploy the Azure Monitor agent to these VMs using the same method that you do for your Azure VMs and then monitor your entire collection of VMs using the same Azure Monitor tools.
Use Azure Policy to deploy agents and assign data collection rules. Azure Policy allows you to have agents automatically deployed to sets of existing VMs and any new VMs that are created. This ensures that all VMs are monitored with minimal intervention by administrators. If you use VM insights, see Enable VM insights by using Azure Policy. If you want to manage Azure Monitor agent without VM insights, see Enable Azure Monitor Agent by using Azure Policy. See Manually create a DCR for a template to create a data collection rule association.
Establish a strategy for structure of data collection rules. Data collection rules define data to collect from virtual machines with the Azure Monitor agent and where to send that data. Each DCR can include multiple collection scenarios and be associated with any number of VMs. Establish a strategy for configuring DCRs to collect only required data for different groups of VMs while minimizing the number of DCRs that you need to manage.
Consider migrating SCOM client management packs to Azure Monitor. If you have an existing SCOM environment for monitoring client workloads, you may be able to migrate enough of the management pack logic to Azure Monitor to allow you to retire your SCOM environment, or at least to retire certain management packs. See Migrate from System Center Operations Manager (SCOM) to Azure Monitor.

Performance efficiency

Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. Use the following information to monitor the performance of your virtual machines.

Design checklist

  • Configure data collection and alerting for monitoring performance of client workflows.

Configuration recommendations

Recommendation Description
Configure data collection and alerting for monitoring performance of client workflows. Use the information at Monitor virtual machines with Monitor virtual machines with Azure Monitor: Collect data to configure client data collection measuring performance of your client workloads. Use the information at Monitor virtual machines with Monitor virtual machines with Azure Monitor: Alerts to create alert rules to be proactively notified of any potential performance issues with your client workloads.

Next step