We are currently facing some problems with Linux Virtual Machines running Ubuntu 18.04 and 20.04, both LTS releases.
The bug first appeared on newly created Ubuntu 20.04 machines.
When the reboot occurred, by command or VMM interface, the machine could not boot and stopped at the Grub selection menu.
And this happened on all 5 of them, after some retries the VMM just power them off.
Upon examining the Event Viewer on one of the Hosts machines, there's a critical error message:
Event ID: 18602
<VM_NAME> has encountered a fatal error and a memory dump has been generated. The guest operating system reported that it failed with the following error code: 0x1E. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX)
After some attempts, they could boot with no problems.
We then proceeded to recreate all the vms with Ubuntu 18.04, and to our surprise, after the vms were updated,
the bug started to show again.
These are some fixes we have tried so far:
Install linux-azure packages and kernel;
Enable Secure boot on Guest Machine;
Scan the disk for corruption;
Try different kernels.
We couldn't try other fixes like Updating the Hyper-V Server to 2019 or disabling secure boot on Host machines, the reason being that we have around 200 VMs across 8 Hosts.
4 Hosts per cluster;
All hosts are running Hyper-V 2016
The problem appears to be related with vlan or the network adapter, since disabling them make the VMs boot right away.
Unfortunately, we are not the only ones with the issue, the bug also affects Debian and RHEL distros:
These ones are the same: