Critical Error 0x1E - Random Boot Failure on Linux Virtual Machines

Lucas Navarezi 1 Reputation point
2021-09-23T13:34:39.94+00:00

Hi,

We are currently facing some problems with Linux Virtual Machines running Ubuntu 18.04 and 20.04, both LTS releases.

The bug first appeared on newly created Ubuntu 20.04 machines.

When the reboot occurred, by command or VMM interface, the machine could not boot and stopped at the Grub selection menu.

And this happened on all 5 of them, after some retries the VMM just power them off.

Upon examining the Event Viewer on one of the Hosts machines, there's a critical error message:

Event ID: 18602
Source: Hyper-V-Worker

<VM_NAME> has encountered a fatal error and a memory dump has been generated. The guest operating system reported that it failed with the following error code: 0x1E. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX)

After some attempts, they could boot with no problems.

We then proceeded to recreate all the vms with Ubuntu 18.04, and to our surprise, after the vms were updated,
the bug started to show again.

These are some fixes we have tried so far:

  • Install linux-azure packages and kernel;
  • Enable Secure boot on Guest Machine;
  • Scan the disk for corruption;
  • Try different kernels.

We couldn't try other fixes like Updating the Hyper-V Server to 2019 or disabling secure boot on Host machines, the reason being that we have around 200 VMs across 8 Hosts.

Current configuration:

  • 2 Clusters;
  • 4 Hosts per cluster;
  • All hosts are running Hyper-V 2016

The problem appears to be related with vlan or the network adapter, since disabling them make the VMs boot right away.

Unfortunately, we are not the only ones with the issue, the bug also affects Debian and RHEL distros:

https://access.redhat.com/solutions/4796261
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1918265
https://learn.microsoft.com/en-us/answers/questions/530961/windows-server-2016-hyper-v-boot-error-w-virtual-s.html

These ones are the same:
https://learn.microsoft.com/en-us/answers/questions/52937/failure-to-boot-on-red-hat-enterprise-linux-rhel-o.html
https://social.technet.microsoft.com/Forums/en-US/1da1f987-52f0-4304-84f1-2c0ab52f3586/failure-to-boot-on-red-hat-enterprise-linux-rhel-or-centos-8-using-hyperv-2016?forum=linuxintegrationservices
https://social.technet.microsoft.com/Forums/en-US/3c48c962-a28d-44bb-bd80-5b7a902404d8/failure-to-boot-on-red-hat-enterprise-linux-rhel-or-centos-8-using-hyperv-2016?forum=winserverhyperv
https://www.reddit.com/r/HyperV/comments/hx2cps/failure_to_boot_on_red_hat_enterprise_linux_rhel/

System Center Virtual Machine Manager
Hyper-V
Hyper-V
A Windows technology providing a hypervisor-based virtualization solution enabling customers to consolidate workloads onto a single server.
2,531 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Limitless Technology 39,341 Reputation points
    2021-09-23T16:14:42.43+00:00

    Hello @Lucas Navarezi

    In this particular case I would recommend to investigate for assistance in Linux dedicated forums as their expertise may be the key to differenciate what Hyper-V does to the machines.

    Otherwise, I understand that you would like to alert Microsoft about a potential bug that affects multiple users with similar scenarios. I would recommend you documenting it on the Feedback Hub:

    https://support.microsoft.com/en-us/windows/send-feedback-to-microsoft-with-the-feedback-hub-app-f59187f8-8739-22d6-ba93-f66612949332

    And if you want to investigate further on HyperV side (since it seems vital for your large environment), my recommendation would be to open a Microsoft Support ticket and let their experts to take a deep dive to analyze Hyper-V logs and potentially file for a Bug using the data collected.

    --------------------

    --If the reply is helpful, please Upvote and Accept as answer--

    1 person found this answer helpful.
    0 comments No comments