Troubleshooting Azure Windows VM extension failures

Overview of Azure Resource Manager templates

Azure Resource Manager templates allows you to declaratively specify the Azure IaaS infrastructure in JSON language by defining the dependencies between resources.

See Authoring extension templates to learn more about authoring templates for using extensions.

In this article we'll learn about troubleshooting some of the common VM extension failures.

Viewing extension status

Azure Resource Manager templates can be executed from Azure PowerShell. Once the template is executed, the extension status can be viewed from Azure Resource Explorer or the command-line tools.

Here is an example:

Azure PowerShell:

Get-AzVM -ResourceGroupName $RGName -Name $vmName -Status

Here is the sample output:

Extensions:  {
  "ExtensionType": "Microsoft.Compute.CustomScriptExtension",
  "Name": "myCustomScriptExtension",
  "SubStatuses": [
    {
      "Code": "ComponentStatus/StdOut/succeeded",
      "DisplayStatus": "Provisioning succeeded",
      "Level": "Info",
      "Message": "    Directory: C:\\temp\\n\\n\\nMode                LastWriteTime     Length Name
          \\n----                -------------     ------ ----                              \\n-a---          9/1/2015   2:03 AM         11
          test.txt                          \\n\\n",
                  "Time": null
      },
    {
      "Code": "ComponentStatus/StdErr/succeeded",
      "DisplayStatus": "Provisioning succeeded",
      "Level": "Info",
      "Message": "",
      "Time": null
    }
  ]
}

Troubleshooting extension failures

Verify that the VM Agent is running and Ready

The VM Agent is required to manage, install and execute extensions. If the VM Agent is not running or is failing to report a Ready status to the Azure platform, then the extensions will not work correctly.

Please refer to the following pages to troubleshoot the VM Agent:

Check for your specific extension troubleshooting guide

Some extensions have a specific page describing how to troubleshoot them. You can find the list of these extensions and pages on Troubleshoot extensions .

View the extension's status

As explained above, the extension's status can be found by running the PowerShell cmdlet:

Get-AzVM -ResourceGroupName $RGName -Name $vmName -Status

or the CLI command:

az vm extension show -g <RG Name> --vm-name <VM Name>  --name <Extension Name>

or in the Azure portal, by browsing to the VM Blade / Settings / Extensions. You can then click on the extension and check its status and message.

Rerun the extension on the VM

If you are running scripts on the VM using Custom Script Extension, you could sometimes run into an error where VM was created successfully but the script has failed. Under these conditions, the recommended way to recover from this error is to remove the extension and rerun the template again. Note: In future, this functionality would be enhanced to remove the need for uninstalling the extension.

Remove the extension from Azure PowerShell

Remove-AzVMExtension -ResourceGroupName $RGName -VMName $vmName -Name "myCustomScriptExtension"

Once the extension has been removed, the template can be re-executed to run the scripts on the VM.

Trigger a new GoalState to the VM

You might notice that an extension hasn't been executed, or is failing to execute because of a missing "Windows Azure CRP Certificate Generator" (that certificate is used to secure the transport of the extension's protected settings). That certificate will be automatically regenerated by restarting the Windows Guest Agent from inside the Virtual Machine:

  • Open the Task Manager
  • Go to the Details tab
  • Locate the WindowsAzureGuestAgent.exe process
  • Right-click, and select "End Task". The process will be automatically restarted

You can also trigger a new GoalState to the VM, by executing a "VM Reapply". VM Reapply is an API introduced in 2020 to reapply a VM's state. We recommend doing this at a time when you can tolerate a short VM downtime. While Reapply itself does not cause a VM reboot, and the vast majority of times calling Reapply will not reboot the VM, there is a very small risk that some other pending update to the VM model gets applied when Reapply triggers a new goal state, and that other change could require a restart.

Azure portal:

In the portal, select the VM and in the left pane under the Support + troubleshooting, select Redeploy + reapply, then select Reapply.

Azure PowerShell (replace the RG Name and VM Name with your values):

Set-AzVM -ResourceGroupName <RG Name> -Name <VM Name> -Reapply

Azure CLI (replace the RG Name and VM Name with your values):

az vm reapply -g <RG Name> -n <VM Name>

If a "VM Reapply" didn't work, you can add a new empty Data Disk to the VM from the Azure Management Portal, and then remove it later once the certificate has been added back.

Look at the extension logs inside the VM

If the previous steps didn't work and if your extension is still in a failed state, the next step is to look at its logs inside the Virtual Machine.

On a Windows VM, the extension logs will typically reside in

C:\WindowsAzure\Logs\Plugins

And the Extension settings and status files will be in

C:\Packages\Plugins

On a Linux VM, the extension logs will typically reside in

/var/log/azure/

And the Extension settings and status files will be in

/var/lib/waagent/

Each extension is different, but they usually follow similar principles:

Extension packages and binaries are downloaded on the VM (eg. "/var/lib/waagent/custom-script/download/1" for Linux or "C:\Packages\Plugins\Microsoft.Compute.CustomScriptExtension\1.10.12\Downloads\0" for Windows).

Their configuration and settings are passed from Azure Platform to the extension handler through the VM Agent (eg. "/var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.1.3/config" for Linux or "C:\Packages\Plugins\Microsoft.Compute.CustomScriptExtension\1.10.12\RuntimeSettings" for Windows)

Extension handlers inside the VM are writing to a status file (eg. "/var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.1.3/status/1.status" for Linux or "C:\Packages\Plugins\Microsoft.Compute.CustomScriptExtension\1.10.12\Status" for Windows) which will then be reported to the Azure Platform. That status is the one reported through PowerShell, CLI or in the VM's extension blade in the Azure portal.

They also write detailed logs of their execution (eg. "/var/log/azure/custom-script/handler.log" for Linux or "C:\WindowsAzure\Logs\Plugins\Microsoft.Compute.CustomScriptExtension\1.10.12\CustomScriptHandler.log" for Windows).

If the VM is recreated from an existing VM

It could happen that you're creating an Azure VM based on a specialized Disk coming from another Azure VM. In that case, it's possible that the old VM contained extensions, and so will have binaries, logs and status files left over. The new VM model will not be aware of the previous VM's extensions states, and it might report an incorrect status for these extensions. We strongly recommend to remove the extensions from the old VM before creating the new one, and then reinstall these extensions once the new VM is created. The same can happen when you create a generalized image from an existing Azure VM. We invite you to remove extensions to avoid inconsistent state from the extensions.