Simplify host networking with Network ATC

Applies to: Azure Stack HCI, version 21H2 Preview

This article guides you through the key functions of using Network ATC, which simplifies the deployment and network configuration management for Azure Stack HCI clusters. This provides an intent-based approach to host network deployment. By specifying one or more intents (management, compute, or storage) for a network adapter, you can automate the deployment of the intended configuration.

If you have feedback or encounter any issues, review the Requirements and best practices section, check the Network ATC event log, and work with your Microsoft support team.

Overview

Deployment and operation of Azure Stack HCI networking can be a complex and error-prone process. Due to the configuration flexibility provided with the host networking stack, there are many moving parts that can be easily misconfigured or overlooked. Staying up to date with the latest best practices is also a challenge as improvements are continuously made to the underlying technologies. Additionally, configuration consistency across HCI cluster nodes is important as it leads to a more reliable experience.

Network ATC can help:

  • Reduce host networking deployment time, complexity, and errors
  • Deploy the latest Microsoft validated and supported best practices
  • Ensure configuration consistency across the cluster
  • Eliminate configuration drift

Definitions

Here is some new terminology:

Intent: An intent is a definition of how you intend to use the physical adapters in your system. An intent has a friendly name, identifies one or more physical adapters, and includes one or more intent types.

An individual physical adapter can only be included in one intent. By default, an adapter does not have an intent (there is no special status or property given to adapters that don’t have an intent). You can have multiple intents; the number of intents you have will be limited by the number of adapters in your system.

Intent type: Every intent requires one or more intent types. The currently supported intent types are:

  • Management - adapters are used for management access to nodes
  • Compute - adapters are used to connect virtual machine (VM) traffic to the physical network
  • Storage - adapters are used for SMB traffic including Storage Spaces Direct

Any combination of the intent types can be specified for any specific single intent. However, certain intent types can only be specified in one intent:

  • Management: Can be defined in a maximum of one intent
  • Compute: Unlimited
  • Storage: Can be defined in a maximum of one intent

Intent mode: An intent can be specified at a standalone level or at a cluster level. Modes are system-wide; you can't have a network intent that is standalone and another that is clustered on the same host system. Clustered mode is the most common choice as Azure Stack HCI nodes are clustered.

  • Standalone mode: Intents are expressed and managed independently for each host. This mode allows you to test an intent before implementing it across a cluster. Once a host is clustered, any standalone intents are ignored. Standalone intents can be copied to a cluster from a node that is not a member of that cluster, or from one cluster to another cluster.

  • Cluster mode: Intents are applied to all cluster nodes. This is the recommended deployment mode and is required when a server is a member of a failover cluster.

Override: By default, Network ATC deploys the most common configuration, asking for the smallest amount of user input. Overrides allow you to customize your deployment if required. For example, you may choose to modify the VLANs used for storage adapters from the defaults. To review the Network ATC defaults, please review Default Values.

Network ATC allows you to modify all configuration that the OS allows. However, the OS limits some modifications to the OS and Network ATC respects these limitations. For example, a virtual switch does not allow modification of SR-IOV after it has been deployed.

Requirements and best practices

The following are requirements and best practices for using Network ATC in Azure Stack HCI:

  • Supported on Azure Stack HCI, version 21H2 or later.

  • All servers in the cluster must be running Azure Stack HCI, version 21H2.

  • Must use two or more physical host systems that are Azure Stack HCI certified.

  • Adapters in the same Network ATC intent must be symmetric (of the same make, model, speed, and configuration) and available on each cluster node. For more information on adapter symmetry, see Switch Embedded Teaming (SET)

  • Each physical adapter specified in an intent, must use the same name on all nodes in the cluster.

  • Ensure each network adapter has an "Up" status, as verified by the PowerShell Get-NetAdapter cmdlet.

  • Cluster nodes must install the following Azure Stack HCI features on each node:

    • Network ATC
    • Data Center Bridging (DCB)
    • Failover Clustering
    • Hyper-V
  • Best practice: Insert each adapter in the same PCI slot(s) in each host. This leads to ease in automated naming conventions by imaging systems.

  • Best practice: Configure the physical network (switches) prior to Network ATC including VLANs, MTU, and DCB configuration. See Physical Network Requirements for more information.

You can use the following cmdlet to install the required Windows features:

Install-WindowsFeature -Name NetworkATC, 'Data-Center-Bridging', 'Failover-Clustering', 'Hyper-V' -IncludeManagementTools

Note

Network ATC does not require a system reboot if the other Azure Stack HCI features have already been installed.

Common Network ATC commands

There are several new PowerShell commands included with Network ATC. Run theGet-Command -ModuleName NetworkATC cmdlet to identify them. Ensure PowerShell is run as an administrator.

Typically, only a few of these cmdlets are needed. Here is a brief overview of the cmdlets before you start:

PowerShell command Description
Add-NetIntent Creates and submits an intent
Set-NetIntent Modifies an existing intent
Get-NetIntent Gets a list of intents
Get-NetIntentStatus Gets the status of intents
New-NetIntentOverrides Specifies overrides to the default configuration
Remove-NetIntent Removes an intent from the local node or cluster. This does not destroy the invoked configuration.
Set-NetIntentRetryState This command instructs Network ATC to try implementing the intent again if it has failed after three attempts. (Get-NetIntentStatus = 'Failed').

Example network intents

Network ATC modifies how you deploy host networking, not what you deploy. Multiple scenarios may be implemented so long as each scenario is supported by Microsoft. Here are some examples of common deployment options, and the PowerShell commands needed. These are not the only combinations available but they should give you an idea of the possibilities.

For simplicity we only demonstrate two physical adapters per SET team, however it is possible to add more. Refer to Plan Host Networking for more information.

Fully converged intent

For this intent, compute, storage, and management networks are deployed and managed across all cluster nodes.

Fully converged network intent

Add-NetIntent -Name ConvergedIntent -Management -Compute -Storage -ClusterName HCI01 -AdapterName pNIC01, pNIC02

Converged compute and storage intent; separate management intent

Two intents are managed across cluster nodes. Management uses pNIC01, and pNIC02; Compute and storage are on different adapters.

Storage and compute converged network intent

Add-NetIntent -Name Mgmt -Management -ClusterName HCI01 -AdapterName pNIC01, pNIC02
Add-NetIntent -Name Compute_Storage -Compute -Storage -ClusterName HCI01 -AdapterName pNIC03, pNIC04

Fully disaggregated intent

For this intent, compute, storage, and management networks are all managed on different adapters across all cluster nodes.

Fully disaggregated network intent

Add-NetIntent -Name Mgmt -Management -ClusterName HCI01 -AdapterName pNIC01, pNIC02
Add-NetIntent -Name Compute -Compute -ClusterName HCI01 -AdapterName pNIC03, pNIC04
Add-NetIntent -Name Storage -Storage -ClusterName HCI01 -AdapterName pNIC05, pNIC06

Storage-only intent

For this intent, only storage is managed. Management and compute adapters are not be managed by Network ATC.

Storage only network intent

Add-NetIntent -Name Storage -Storage -ClusterName HCI01 -AdapterName pNIC05, pNIC06

Compute and management intent

For this intent, compute and management networks are managed, but not storage.

Management and compute network intent

Add-NetIntent -Name Management_Compute -Management -Compute -ClusterName HCI01 -AdapterName pNIC01, pNIC02

Multiple compute (switch) intent

For this intent, multiple compute switches are managed.

Multiple switches network intent

Add-NetIntent -Name Compute1 -Compute -ClusterName HCI01 -AdapterName pNIC03, pNIC04
Add-NetIntent -Name Compute2 -Compute -ClusterName HCI01 -AdapterName pNIC05, pNIC06

Activity overview

The following activities represent common usage of Network ATC.

You can specify any combination of the following types of intent:

  • Compute – adapters will be used to connect virtual machines traffic to the physical network
  • Storage – adapters will be used for SMB traffic including Storage Spaces Direct
  • Management – adapters will be used for management access to nodes. This intent is not covered in this article, but feel free to explore.

This article covers the following activities:

  • Activity 1: Configure a cluster

  • Activity 2: Configure an override

  • Activity 3: Validate automatic remediation

  • Activity 4: Remove an intent

Activity 1: Configure a cluster

In this activity, we maintain a consistent configuration across all cluster nodes. This is beneficial for several reasons including improved reliability of the cluster. The cluster is considered the configuration boundary. That is, all nodes in the cluster share the same configuration (symmetric intent).

Important

If a node is clustered, you must use a clustered intent. Standalone intents are ignored.

Task 1: Create a cluster

Create a failover cluster of one or more nodes. The cluster can include any number of supported Azure Stack HCI nodes. We also demonstrate adding other nodes to the cluster later.

Note

You can add all nodes at one time using the New-Cluster cmdlet, then add the intent to all nodes. Alternatively, you can incrementally add nodes to the cluster. The new nodes are managed automatically.

  1. Create the cluster on the first node. A simple example is shown as follows:

    New-Cluster -Name HCI01
    
  2. Use the following example cmdlets to verify that the cluster was created and the nodes in the cluster.

    Get-Cluster
    Get-ClusterNode
    

Task 2: Create a cluster intent

In this task, an intent is created that specifies the compute and storage intent types with no overrides.

  1. On one of the cluster nodes, run Get-NetAdapter to review the physical adapters. Ensure that each node in the cluster has the same named physical adapters.

    Get-NetAdapter -Name pNIC01, pNIC02 -CimSession (Get-ClusterNode).Name | Select Name, PSComputerName
    
  2. Run the following command to add the storage and compute intent types to pNIC01 and pNIC02. Note that we specify the -ClusterName parameter.

    Add-NetIntent -Name Cluster_ComputeStorage -Compute -Storage -ClusterName HCI01 -AdapterName pNIC01, pNIC02
    

    The command should immediately return after some initial verification. The cmdlet checks that each node in the cluster has:

    • the adapters specified
    • adapters report status 'Up'
    • adapters ready to be teamed to create the specified vSwitch
  3. Run the Get-NetIntent cmdlet to see the cluster intent. If you have more than one intent, you can specify the Name parameter to see details of only a specific intent.

    Get-NetIntent -ClusterName HCI01
    
  4. To see the provisioning status of the intent, run the Get-NetIntentStatus command:

    Get-NetIntentStatus -ClusterName HCI01 -Name Cluster_ComputeStorage
    

    Note the status parameter that shows Provisioning, Validating, Success, Failure.

  5. Status should display success in a few minutes. If this doesn't occur or you see a Status parameter failure, check the event viewer for issues.

    Get-NetIntentStatus -ClusterName HCI01 -Name Cluster_ComputeStorage
    
  6. Check that the configuration has been applied to all cluster nodes. For this example, check that the VMSwitch was deployed on each node in the cluster and that host virtual NICs were created for storage. For more validation examples, see the Network ATC demo.

    Get-VMSwitch -CimSession (Get-ClusterNode).Name | Select Name, ComputerName
    

Note

At this time, Network ATC does not configure IP addresses for any of its managed adapters. Once Get-NetIntentStatus reports status completed, you should add IP addresses to the adapters.

Task 3: Add a new node to the cluster

You can freely add nodes to the cluster. Each node in the cluster receives the same intent, improving the reliability of the cluster (the new node must meet the requirements mentioned earlier in this article).

In this task, you will add additional nodes to the cluster and observe how a consistent configuration is enforced across all nodes in the cluster.

  1. Use the Add-ClusterNode cmdlet to add the additional (unconfigured) nodes to the cluster. You only need management access to the cluster at this time. Each node in the cluster should have all pNICs named the same.

    Add-ClusterNode -Cluster HCI01
    Get-ClusterNode
    
  2. Check the status across all cluster nodes using the -ClusterName parameter.

    Get-NetIntentStatus -ClusterName HCI01
    

    Note

    If pNICs do not exist on one of the additional nodes, Get-NetIntentStatus will report the error 'PhysicalAdapterNotFound', which easily identifies the provisioning issue.

  3. Check the provisioning status of all nodes using Get-NetIntentStatus. The cmdlet reports the configuration for both nodes. Note that this may take a similar amount of time to provision as the original node.

    Get-NetIntentStatus -ClusterName HCI01
    
  4. You can experiment by adding several nodes to the cluster at once.

Activity 2: Configure an override

In this activity, we will modify the default configuration and verify Network ATC makes the necessary changes.

Important

Network ATC implements the Microsoft-tested, Best Practice configuration. We highly recommend that you only modify the default configuration with guidance from Microsoft Azure Stack HCI support teams.

Task 1: Update an intent with a single override

This task will help you override the default configuration which has already been deployed. This example modifies the default bandwidth reservation for SMB Direct.

Important

The Set-NetIntent cmdlet is used to update an already deployed intent. Use the Add-NetIntent cmdlet to add an override at initial deployment time

  1. Get a list of possible override cmdlets. We use wildcards to see the options available:

    Get-Command -Noun NetIntent*Over* -Module NetworkATC
    
  2. Create an override object for the DCB Quality of Service (QoS) configuration:

    $QosOverride = New-NetIntentQosPolicyOverrides
    $QosOverride
    
  3. Modify the bandwidth percentage for SMB Direct:

    $QosOverride.BandwidthPercentage_SMB = 25
    $QosOverride
    

    Note

    It is expected that no values appear for any property you don’t override.

  4. Submit the intent request specifying the override:

    Set-NetIntent -Name Cluster_ComputeStorage -QosPolicyOverrides $QosOverride
    
  5. Wait for the provisioning status to complete:

    Get-NetIntentStatus -Name Cluster_ComputeStorage | Format-Table IntentName, Host, ProvisioningStatus, ConfigurationStatus
    
  6. Check that the override has been properly set on all cluster nodes. In the example, the SMB_Direct traffic class was overridden with a bandwidth percentage of 25%:

    Get-NetQosTrafficClass -Cimsession (Get-ClusterNode).Name | Select PSComputerName, Name, Priority, Bandwidth
    

Task 2: Update an intent with multiple overrides

This task will help you override the default configuration which has already been deployed. This example modifies the default bandwidth reservation for SMB Direct and the maximum transmission unit (MTU) of the adapters.

  1. Create an override object. In this example, we create two objects - one for QoS properties and one for a physical adapter property.

    $QosOverride = New-IntentQosPolicyOverrides
    $AdapterOverride = New-NetIntentAdapterPropertyOverrides
    $QosOverride
    $AdapterOverride
    
  2. Modify the SMB bandwidth percentage:

    $QosOverride.BandwidthPercentage_SMB = 60
    $QosOverride
    
  3. Modify the MTU size (JumboPacket) value:

    $AdapterOverride.JumboPacket = 9014
    
  4. Use the Set-NetIntent command to update the intent and specify the overrides objects previously created.

    Use the appropriate parameter based on the type of override you're specifying. In the example below, we use the AdapterPropertyOverrides parameter for the $AdapterOverride object that was created with New-NetIntentAdapterPropertyOverrides cmdlet whereas the QosPolicyOverrides parameter is used with the $QosOverride object created from New-NetIntenQosPolicyOverrides cmdlet.

    Set-NetIntent -ClusterName HCI01 -Name Cluster_ComputeStorage -AdapterPropertyOverrides $AdapterOverride -QosPolicyOverride $QosOverride
    
  5. First, notice that the status for all nodes in the cluster has changed to ProvisioningUpdate and Progress is on 1 of 2. The progress property is similar to a configuration watermark in that you have a new submission that must be enacted.

    Get-NetIntentStatus -ClusterName HCI01
    
  6. Wait for the provisioning status to complete:

    Get-NetIntentStatus -ClusterName HCI01
    
  7. Check that traffic class was overridden with a bandwidth percentage of 60%.

    Get-NetQosTrafficClass -Cimsession (Get-ClusterNode).Name | Select PSComputerName, Name, Priority, Bandwidth 
    
  8. Check that the adapters MTU (JumboPacket) value was modified and that the host virtual NICs created for storage also have been modified.

    Get-NetAdapterAdvancedProperty -Name pNIC01, pNIC02, vSMB* -RegistryKeyword *JumboPacket -Cimsession (Get-ClusterNode).Name
    

    Important

    Ensure you modify the cmdlet above to include the adapter names in the intent specified

Activity 3: Validate automatic remediation

Network ATC ensures that the deployed configuration stays the same across all cluster nodes. In this activity, we will modify one the configuration (without an override) emulating an accidental configuration change and observe how the reliability of the system is improved by remediating the misconfigured property.

Note

ATC will automatically remediate all of the configuration it manages.

  1. Check the adapter's existing MTU (JumboPacket) value:

    Get-NetAdapterAdvancedProperty -Name pNIC01, pNIC02, vSMB* -RegistryKeyword *JumboPacket -Cimsession (Get-ClusterNode).Name
    
  2. Modify one of the physical adapter's MTU without specifying an override. This emulates an accidental change or "configuration drift" which must be remediated.

    Set-NetAdapterAdvancedProperty -Name pNIC01 -RegistryKeyword *JumboPacket -RegistryKeyword *JumboPacket -RegistryValue 4088
    
  3. Verify that the adapter's existing MTU (JumboPacket) value has been modified:

    Get-NetAdapterAdvancedProperty -Name pNIC01, pNIC02, vSMB* -RegistryKeyword *JumboPacket -Cimsession (Get-ClusterNode).Name
    
  4. Retry the configuration. This step is only performed to expedite the remediation. Network ATC will automatically remediate this configuration.

    Set-NetIntentRetryState -ClusterName HCI01 -Name Cluster_ComputeStorage
    
  5. Verify that the consistency check has completed:

    Get-NetIntentStatus -ClusterName HCI01 -Name Cluster_ComputeStorage
    
  6. Verify that the adapter's MTU (JumboPacket) value has returned to the expected value:

    Get-NetAdapterAdvancedProperty -Name pNIC01, pNIC02, vSMB* -RegistryKeyword *JumboPacket -Cimsession (Get-ClusterNode).Name
    

Activity 4: Remove an intent

If you want to test various configurations on the same adapters, you may need to remove an intent. If you previously deployed a configuration on your system, you may need to reset the node so that the configuration can be deployed. To do this, copy and paste the following commands to remove all existing intents and their corresponding vSwitch:

    $intents = Get-NetIntent
    foreach ($intent in $intents)
    {
        Remove-NetIntent -Name $intent.IntentName
        Remove-VMSwitch -Name "*$($intent.IntentName)*" -ErrorAction SilentlyContinue -Force
    }
    
    Get-NetQosTrafficClass | Remove-NetQosTrafficClass
    Get-NetQosPolicy | Remove-NetQosPolicy -Confirm:$false
    Get-NetQosFlowControl | Disable-NetQosFlowControl

Post-deployment tasks

There are several tasks to complete following a Network ATC deployment, including the following:

Add non-APIPA addresses to storage adapters

This can be accomplished using DHCP on the storage VLANs or by using the NetIPAddress cmdlets.

Set SMB bandwidth limits

If live migration uses SMB Direct (RDMA), configure a bandwidth limit to ensure that live migration does not consume all the bandwidth used by Storage Spaces Direct and Failover Clustering.

Stretched cluster configuration

Stretched clusters require additional configuration that must be manually performed following the successful deployment of an intent. For stretched clusters, all nodes in the cluster must use the same intent.

Default values

This section lists some of the key default values used by Network ATC.

Default VLANs

The following default VLANs are used. These VLANs must be available on the physical network for proper operation.

Adapter Intent Default Value
Management Configured VLAN for management adapters isn't modified
Storage Adapter 1 711
Storage Adapter 2 712
Storage Adapter 3 713
Storage Adapter 4 714
Storage Adapter 5 715
Storage Adapter 6 716
Storage Adapter 7 717
Storage Adapter 8 718
Future Use 719

Consider the following command:

Add-NetIntent -Name Cluster_ComputeStorage -Storage -ClusterName HCI01 -AdapterName pNIC01, pNIC02, pNIC03, pNIC04

The physical NIC (or virtual NIC if required) is configured to use VLANs 711, 712, 713, and 714 respectively.

Default Data Center Bridging (DCB) configuration

Network ATC establishes the following priorities and bandwidth reservations. This configuration should also be configured on the physical network.

Policy Use Default Priority Default Bandwidth Reservation
Cluster Cluster Heartbeat reservation 7 2% if the adapter(s) are <= 10 Gbps; 1% if the adapter(s) are > 10 Gbps
SMB_Direct RDMA Storage Traffic 3 50%
Default All other traffic types 0 Remainder

Next steps

Learn more about Stretched clusters.