Network Connectivity Monitoring with Connection Monitor
Starting 1 July 2021, you will not be able to add new tests in an existing workspace or enable a new workspace in Network Performance Monitor. You will also not be able to add new connection monitors in Connection Monitor (classic). You can continue to use the tests and connection monitors created prior to 1 July 2021. To minimize service disruption to your current workloads, migrate your tests from Network Performance Monitor or migrate from Connection Monitor (classic) to the new Connection Monitor in Azure Network Watcher before 29 February 2024.
Connection Monitor provides unified end-to-end connection monitoring in Azure Network Watcher. The Connection Monitor feature supports hybrid and Azure cloud deployments. Network Watcher provides tools to monitor, diagnose, and view connectivity-related metrics for your Azure deployments.
Here are some use cases for Connection Monitor:
- Your front-end web server VM communicates with a database server VM in a multi-tier application. You want to check network connectivity between the two VMs.
- You want VMs in the East US region to ping VMs in the Central US region, and you want to compare cross-region network latencies.
- You have multiple on-premises office sites in Seattle, Washington, and in Ashburn, Virginia. Your office sites connect to Microsoft 365 URLs. For your users of Microsoft 365 URLs, compare the latencies between Seattle and Ashburn.
- Your hybrid application needs connectivity to an Azure Storage endpoint. Your on-premises site and your Azure application connect to the same Azure Storage endpoint. You want to compare the latencies of the on-premises site to the latencies of the Azure application.
- You want to check the connectivity between your on-premises setups and the Azure VMs that host your cloud application.
Connection Monitor combines the best of two features: the Network Watcher Connection Monitor (Classic) feature and the Network Performance Monitor (NPM) Service Connectivity Monitor, ExpressRoute Monitoring, and Performance Monitoring feature.
Here are some benefits of Connection Monitor:
- Unified, intuitive experience for Azure and hybrid monitoring needs
- Cross-region, cross-workspace connectivity monitoring
- Higher probing frequencies and better visibility into network performance
- Faster alerting for your hybrid deployments
- Support for connectivity checks that are based on HTTP, TCP, and ICMP
- Metrics and Log Analytics support for both Azure and non-Azure test setups
To start using Connection Monitor for monitoring, follow these steps:
- Install monitoring agents.
- Enable Network Watcher on your subscription.
- Create a connection monitor.
- Set up data analysis and alerts.
- Diagnose issues in your network.
The following sections provide details for these steps.
Install monitoring agents
Connection Monitor relies on lightweight executable files to run connectivity checks. It supports connectivity checks from both Azure environments and on-premises environments. The executable file that you use depends on whether your VM is hosted on Azure or on-premises.
Agents for Azure virtual machines
To make Connection Monitor recognize your Azure VMs as monitoring sources, install the Network Watcher Agent virtual machine extension on them. This extension is also known as the Network Watcher extension. Azure virtual machines require the extension to trigger end-to-end monitoring and other advanced functionality.
Rules for a network security group (NSG) or firewall can block communication between the source and destination. Connection Monitor detects this issue and shows it as a diagnostic message in the topology. To enable connection monitoring, ensure that NSG and firewall rules allow packets over TCP or ICMP between the source and destination.
Agents for on-premises machines
To make Connection Monitor recognize your on-premises machines as sources for monitoring, install the Log Analytics agent on the machines. Then enable the Network Performance Monitor solution. These agents are linked to Log Analytics workspaces, so you need to set up the workspace ID and primary key before the agents can start monitoring.
To install the Log Analytics agent for Windows machines, see Azure Monitor virtual machine extension for Windows.
If the path includes firewalls or network virtual appliances (NVAs), then make sure that the destination is reachable.
For Windows machines, to open the port, run the EnableRules.ps1 PowerShell script without any parameters in a PowerShell window with administrative privileges.
For Linux machines, portNumbers to be used needs to be changed manually.
- Navigate to path: /var/opt/microsoft/omsagent/npm_state .
- Open file: npmdregistry
- Change the value for Port Number
“PortNumber:<port of your choice>”
Do note that port numbers being used should be same across all the agents used in a workspace.
The script creates registry keys required by the solution. It also creates Windows Firewall rules to allow agents to create TCP connections with each other. The registry keys created by the script specify whether to log the debug logs and the path for the logs file. The script also defines the agent TCP port used for communication. The values for these keys are automatically set by the script. Don't manually change these keys. The port opened by default is 8084. You can use a custom port by providing the parameter portNumber to the script. Use the same port on all the computers where the script is run. Read more about the network requirements for Log Analytics agents
The script configures only Windows Firewall locally. If you have a network firewall, make sure that it allows traffic destined for the TCP port used by Network Performance Monitor.
Enable Network Watcher on your subscription
All subscriptions that have a virtual network are enabled with Network Watcher. When you create a virtual network in your subscription, Network Watcher is automatically enabled in the virtual network's region and subscription. This automatic enabling doesn't affect your resources or incur a charge. Ensure that Network Watcher isn't explicitly disabled on your subscription.
For more information, see Enable Network Watcher.
Create a connection monitor
Connection Monitor monitors communication at regular intervals. It informs you of changes in reachability and latency. You can also check the current and historical network topology between source agents and destination endpoints.
Sources can be Azure VMs or on-premises machines that have an installed monitoring agent. Destination endpoints can be Microsoft 365 URLs, Dynamics 365 URLs, custom URLs, Azure VM resource IDs, IPv4, IPv6, FQDN, or any domain name.
Access Connection Monitor
- On the Azure portal home page, go to Network Watcher.
- On the left, in the Monitoring section, select Connection Monitor.
- You see all of the connection monitors that were created in Connection Monitor. To see the connection monitors that were created in the classic experience of Connection Monitor, go to the Connection Monitor tab.
Create a connection monitor
In connection monitors that you create in Connection Monitor, you can add both on-premises machines and Azure VMs as sources. These connection monitors can also monitor connectivity to endpoints. The endpoints can be on Azure or any other URL or IP.
Connection Monitor includes the following entities:
- Connection monitor resource – A region-specific Azure resource. All of the following entities are properties of a connection monitor resource.
- Endpoint – A source or destination that participates in connectivity checks. Examples of endpoints include Azure VMs, on-premises agents, URLs, and IPs.
- Test configuration – A protocol-specific configuration for a test. Based on the protocol you chose, you can define the port, thresholds, test frequency, and other parameters.
- Test group – The group that contains source endpoints, destination endpoints, and test configurations. A connection monitor can contain more than one test group.
- Test – The combination of a source endpoint, destination endpoint, and test configuration. A test is the most granular level at which monitoring data is available. The monitoring data includes the percentage of checks that failed and the round-trip time (RTT).
All sources, destinations, and test configurations that you add to a test group get broken down to individual tests. Here's an example of how sources and destinations are broken down:
- Test group: TG1
- Sources: 3 (A, B, C)
- Destinations: 2 (D, E)
- Test configurations: 2 (Config 1, Config 2)
- Total tests created: 12
|Test number||Source||Destination||Test configuration|
Connection monitors have the following scale limits:
- Maximum connection monitors per subscription per region: 100
- Maximum test groups per connection monitor: 20
- Maximum sources and destinations per connection monitor: 100
- Maximum test configurations per connection monitor: 20
Analyze monitoring data and set alerts
After you create a connection monitor, sources check connectivity to destinations based on your test configuration.
Checks in a test
Based on the protocol that you chose in the test configuration, Connection Monitor runs a series of checks for the source-destination pair. The checks run according to the test frequency that you chose.
If you use HTTP, the service calculates the number of HTTP responses that returned a valid response code. Valid response codes can be set using PowerShell and CLI. The result determines the percentage of failed checks. To calculate RTT, the service measures the time between an HTTP call and the response.
If you use TCP or ICMP, the service calculates the packet-loss percentage to determine the percentage of failed checks. To calculate RTT, the service measures the time taken to receive the acknowledgment (ACK) for the packets that were sent. If you enabled traceroute data for your network tests, you can see hop-by-hop loss and latency for your on-premises network.
States of a test
Based on the data that the checks return, tests can have the following states:
- Pass – Actual values for the percentage of failed checks and RTT are within the specified thresholds.
- Fail – Actual values for the percentage of failed checks or RTT exceeded the specified thresholds. If no threshold is specified, then a test reaches the Fail state when the percentage of failed checks is 100.
- Warning –
- If threshold is specified and Connection Monitor observes checks failed percent more than 80% of threshold, the test is marked as warning.
- In the absence of specified thresholds, Connection Monitor automatically assigns a threshold. When that threshold is exceeded, the test status changes to Warning. For round trip time in TCP or ICMP tests, the threshold is 750msec. For checks failed percent , the threshold is 10%.
- Indeterminate – No data in Log Analytics Workspace. Check metrics.
- Not Running – Disabled by disabling test group
Data collection, analysis, and alerts
The data that Connection Monitor collects is stored in the Log Analytics workspace. You set up this workspace when you created the connection monitor.
Monitoring data is also available in Azure Monitor Metrics. You can use Log Analytics to keep your monitoring data for as long as you want. Azure Monitor stores metrics for only 30 days by default.
You can set metric-based alerts on the data.
On the monitoring dashboards, you see a list of the connection monitors that you can access for your subscriptions, regions, time stamps, sources, and destination types.
When you go to Connection Monitor from Network Watcher, you can view data by:
- Connection monitor – List of all connection monitors created for your subscriptions, regions, time stamps, sources, and destination types. This view is the default.
- Test groups – List of all test groups created for your subscriptions, regions, time stamps, sources, and destination types. These test groups aren't filtered by connection monitors.
- Test – List of all tests that run for your subscriptions, regions, time stamps, sources, and destination types. These tests aren't filtered by connection monitors or test groups.
In the following image, the three data views are indicated by arrow 1.
On the dashboard, you can expand each connection monitor to see its test groups. Then you can expand each test group to see the tests that run in it.
You can filter a list based on:
Top-level filters – Search list by text, entity type (Connection Monitor, test group or test) timestamp and scope. Scope includes subscriptions, regions, sources, and destination types. See box 1 in the following image.
State-based filters – Filter by the state of the connection monitor, test group, or test. See box 2 in the following image.
Alert based filter - Filter by alerts fired on the connection monitor resource. See box 3 in the following image.
For example, to look at all tests in Connection Monitor where the source IP is 10.192.64.56:
- Change the view to Test.
- In the search field, type 10.192.64.56
- In Scope in top level filter, select Sources.
To show only failed tests in Connection Monitor where the source IP is 10.192.64.56:
- Change the view to Test.
- For the state-based filter, select Fail.
- In the search field, type 10.192.64.56
- In Scope in top level filter, select Sources.
To show only failed tests in Connection Monitor where the destination is outlook.office365.com:
- Change view to Test.
- For the state-based filter, select Fail.
- In the search field, enter office.live.com
- In Scope in top level filter, select Destinations.
To know reason for failure of a Connection monitor or test group or test, click the column named reason. This tells which threshold ( checks failed % or RTT) was breached and related diagnostic messages
To view the trends in RTT and the percentage of failed checks for a connection monitor:
Select the connection monitor that you want to investigate.
You will see the following sections
- Essentials - Resource specific properties of the selected Connection Monitor
- Summary -
- Aggregated trendlines for RTT and percentage of failed checks for all tests in the connection monitor. You can set a specific time to view the details.
- Top 5 across test groups, sources and destinations based on the RTT or percentage of failed checks.
- Tabs for Test Groups , Sources, Destinations and Test Configurations- Lists test groups, sources or destinations in the Connection Monitor. Check tests failed, aggregate RTT and checks failed % values. You can also go back in time to view data.
- Issues - Hop level issues for each test in the Connection Monitor.
- Click View all tests - to view all tests in the Connection Monitor
- Click View all test groups, test configurations, sources and destinations - to view details specific to each.
- Choose a test group, test configuration, source or destination - to view all tests in the entity.
From the view all tests view, you can:
- Select tests and click compare.
- Use cluster to expand compound resources like VNET, Subnets to its child resources
- View topology for any tests by clicking topology.
To view the trends in RTT and the percentage of failed checks for a test group:
- Select the test group that you want to investigate.
- You will view similar to connection monitor - essentials, summary, table for test groups, sources, destinations and test configurations. Navigate them like you would do for a connection monitor
To view the trends in RTT and the percentage of failed checks for a test:
- Select the test that you want to investigate. You will see the network topology and the end to end trend charts for checks failed % and round trip time. To see the identified issues, in the topology, select any hop in the path. (These hops are Azure resources.) This functionality isn't currently available for on-premises networks
Log queries in Log Analytics
Use Log Analytics to create custom views of your monitoring data. All data that the UI displays is from Log Analytics. You can interactively analyze data in the repository. Correlate the data from Agent Health or other solutions that are based in Log Analytics. Export the data to Excel or Power BI, or create a shareable link.
Metrics in Azure Monitor
In connection monitors that were created before the Connection Monitor experience, all four metrics are available: % Probes Failed, AverageRoundtripMs, ChecksFailedPercent (Preview), and RoundTripTimeMs (Preview). In connection monitors that were created in the Connection Monitor experience, data is available only for the metrics that are tagged with (Preview).
When you use metrics, set the resource type as Microsoft.Network/networkWatchers/connectionMonitors
|Metric||Display name||Unit||Aggregation type||Description||Dimensions|
|ProbesFailedPercent (classic)||% Probes Failed (classic)||Percentage||Average||Percentage of connectivity monitoring probes failed.||No dimensions|
|AverageRoundtripMs (classic)||Avg. Round-trip Time (ms) (classic)||Milliseconds||Average||Average network RTT for connectivity monitoring probes sent between source and destination.||No dimensions|
|ChecksFailedPercent||% Checks Failed||Percentage||Average||Percentage of failed checks for a test.||ConnectionMonitorResourceId
|RoundTripTimeMs||Round-trip Time (ms)||Milliseconds||Average||RTT for checks sent between source and destination. This value isn't averaged.||ConnectionMonitorResourceId
|TestResult||Test Result||Count||Average||Connection monitor test result||SourceAddress
Metric based alerts for Connection Monitor
You can create metric alerts on connection monitors using the methods below
- From Connection Monitor, during creation of Connection Monitor using Azure portal
- From Connection Monitor, using "Configure Alerts" in the dashboard
- From Azure Monitor - To create an alert in Azure Monitor:
- Choose the connection monitor resource that you created in Connection Monitor.
- Ensure that Metric shows up as signal type for the connection monitor.
- In Add Condition, for the Signal Name, select ChecksFailedPercent(Preview) or RoundTripTimeMs(Preview).
- For Signal Type, choose Metrics. For example, select ChecksFailedPercent(Preview).
- All of the dimensions for the metric are listed. Choose the dimension name and dimension value. For example, select Source Address and then enter the IP address of any source in your connection monitor.
- In Alert Logic, fill in the following details:
- Condition Type: Static.
- Condition and Threshold.
- Aggregation Granularity and Frequency of Evaluation: Connection Monitor updates data every minute.
- In Actions, choose your action group.
- Provide alert details.
- Create the alert rule.
Diagnose issues in your network
Connection Monitor helps you diagnose issues in your connection monitor and your network. Issues in your hybrid network are detected by the Log Analytics agents that you installed earlier. Issues in Azure are detected by the Network Watcher extension.
You can view issues in the Azure network in the network topology.
For networks whose sources are on-premises VMs, the following issues can be detected:
- Request timed out.
- Endpoint not resolved by DNS – temporary or persistent. URL invalid.
- No hosts found.
- Source unable to connect to destination. Target not reachable through ICMP.
- Certificate-related issues:
- Client certificate required to authenticate agent.
- Certificate relocation list isn't accessible.
- Host name of the endpoint doesn't match the certificate's subject or subject alternate name.
- Root certificate is missing in source's Local Computer Trusted Certification Authorities store.
- SSL certificate is expired, invalid, revoked, or incompatible.
For networks whose sources are Azure VMs, the following issues can be detected:
- Agent issues:
- Agent stopped.
- Failed DNS resolution.
- No application or listener listening on the destination port.
- Socket could not be opened.
- VM state issues:
- Not allocated
- ARP table entry is missing.
- Traffic was blocked because of local firewall issues or NSG rules.
- Virtual network gateway issues:
- Missing routes.
- The tunnel between two gateways is disconnected or missing.
- The second gateway wasn't found by the tunnel.
- No peering info was found.
If there are 2 connected gateways and one of them is not in the same region as source endpoint, CM identifies it as a 'no route learned' for the topology view. Connectivity is not impacted. This is a known issue and fix is in progress.
- Route was missing in Microsoft Edge.
- Traffic stopped because of system routes or UDR.
- BGP isn't enabled on the gateway connection.
- The DIP probe is down at the load balancer.