Why proactively ensuring you have access to GRUB and sysrq in your Linux VM could save you lots of down time?
Quite simply, to speed up the recovery time of your IaaS Linux Virtual Machine most recovery options will be possible if GRUB is configured.
Recovering Linux Virtual Machines in Azure
You will, at some point have to perform some type of recovery in your Linux VM unless you are very lucky, this inevitable.
The reasons for performing a VM recovery are numerous and can be anything such as:
• corrupt file systems/kernel/MBR
• failed kernel upgrades
• incorrect fstab configurations
• firewall configurations
• lost password
• mangled sshd configurations files
• Networking configurations
• Many others
You should take some time and verify if you can access GRUB on your VM's deployed in Azure. Official Microsoft docs
Ensure you take backups of files before making changes.
Watch this video to see how you can quickly recover you Linux VM once you have access to GRUB
There are a number of methods to aid recovery however in a Cloud environment this has been challenging and time is of the essence we understand you need to restore services at lightning speed.
With the introduction of the Azure Serial Console you have the power to interact with your Linux VM and manipulate many aspects including how the kernel will boot. The more experienced Linux/Unix sys admins will appreciate the single user and emergency modes that are accessible via the Azure Console making Disk Swap and VM deletion for many recovery scenarios redundant.
The method of recovery depends on the problem you are facing, for example a lost or misplaced password can be reset through Azure Portal options -> Reset Password . This communicates with the Linux Guest other possible tools such as Custom Script and other mechanisms are available however these options require that the Linux waagent be up and in a healthy state.
By ensuring that you have access to the Azure Serial Console and GRUB a password or an incorrect configuration change can take a matter of minutes instead of hours. You could even force the VM to boot from an alternative kernel should you have multiple kernels on disk in the scenario where your primary kernel becomes corrupt.
suggested order of recovery methods
• Azure Serial Console
• Disk Swap - can be automated using either:
• Legacy Method
Not all Linux Azure VMs are configured by default for GRUB access and neither are they all configured to be interrupted with the sysrq commands. Some older distros such as SLES 11 Linux VM's are not configured to display Login prompt in the Azure Serial Console
In this current draft article I will review various distros and document findings on how to configure GRUB and what's the impact. (RH and SUSE to complete)
How to configure Linux VM to accept SysRq keys
The sysrq key is enabled on some newer Linux distros by default , whilst on other it could be configured for accepting values only for certain SysRq features and on others it could be disabled completely especially in many older versions.
The SysRq feature is useful for rebooting a crashed or hung VM directly from the Azure Serial Console, I also find it extremely helpful in gaining access to the the GRUB menu, alternatively restarting a VM from another Portal window or ssh session might drop your current console connection thus expiring GRUB Timeouts to which are used to display the GRUB menu.
The VM must be configured to accept a value of 1 for the kernel parameter which enables all functions of sysrq or 128 which allows reboot/poweroff
[video width="1920" height="1080" mp4="https://msdnshared.blob.core.windows.net/media/2018/10/enable-sysrq-for-Azure-Serial-Console1.mp4"][/video]
To configure the VM to accept a reboot via SysRq commands on the Azure Portal you will need to set a value of 1 for the kernel parameter kernel.sysrq
For this configuration to persist a reboot it can be configured by adding entries to the file sysctl.conf
echo kernel.sysrq=1 >> /etc/sysctl.conf
To configure the kernel parameter dynamically
sysctl -w kernel.sysrq=1
If you do not have root access or sudo is broken you can't enable SysRq from a shell prompt?
You can enable the sysrq in such a scenario using the Portal. This can be beneficial if the sudoers.d/waagent file has become broken or an admin has deleted it.
Using the Azure Portal Operations -> Run Command -> RunShellScript feature, requires the waagent process be healthy you can then inject this command to enable sysrq
Select Reboot and Send SysRq Command
The system should log a reset message such as this
Ubuntu GRUB configuration
By default you should be able to access GRUB by holding down Esc key during the VM boot, however in testing , the GRUB menu was not always presented.
To force and keep the GRUB menu on screen in the Azure Serial Console you can test and use one of these 3 options.
update this file /etc/default/grub.d/50-cloudimg-settings.cfg and it will keep the GRUB menu on screen for ever how long you specify.
We can also achieve similar behavior by making changes to this file /etc/default/grub and observe 3 seconds to hit Esc
Comment these 2 lines:
And add replace with
This configuration gives you 5 seconds to hit Esc to acess GRUB and does perform a countdown to select the kernel
Update the file /etc/default/grub
For all these options you need to recreate GRUB configurations by running:
Ubuntu 12.04 will allow to see serial console but does not allow to interact. You will not see a login: prompt.
The settings required by each Ubuntu version for configuring serial console can be found here
Testing 12.04 was sufficient to
1) Create a file called /etc/init/ttyS0.conf containing the following:
# ttyS0 - getty # # This service maintains a getty on ttyS0 from the point the system is # started until it is shut down again. start on stopped rc RUNLEVEL= stop on runlevel [!12345] respawn exec /sbin/getty -L 115200 ttyS0 vt102
2) Ask upstart to start the getty
sudo start ttyS0
Red Hat GRUB configuration
The default /etc/default/grub configuration on these versions is adequately configured
GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL="serial console" GRUB_SERIAL_COMMAND="serial" GRUB_CMDLINE_LINUX="console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300" GRUB_DISABLE_RECOVERY="true"
Simply enable the SysRq key
sysctl -w kernel.sysrq=1;echo kernel.sysrq=1 >> /etc/sysctl.conf;sysctl -a | grep -i sysrq
The file to modify is /etc/default/grub - a default config looks like this:
GRUB_TIMEOUT=1 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300" GRUB_DISABLE_RECOVERY="true"
change the following lines in /etc/default/grub
also add this line:
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"
/etc/default/grub should now look similar to this:
GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL="serial console" GRUB_CMDLINE_LINUX="console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300" GRUB_DISABLE_RECOVERY="true"
Complete and update grub configuration using
grub2-mkconfig -o /boot/grub2/grub.cfg
set the SysRq kernel parameter:
sysctl -w kernel.sysrq=1;echo kernel.sysrq=1 >> /etc/sysctl.conf;sysctl -a | grep -i sysrq
For the more adventurous: I also configured GRUB and SysRq using a single line via the Run Command feature it worked well but please backup your files before running this.
cp /etc/default/grub /etc/default/grub.bak; sed -i 's/GRUB_TIMEOUT=1/GRUB_TIMEOUT=5/g' /etc/default/grub; sed -i 's/GRUB_TERMINAL_OUTPUT="console"/GRUB_TERMINAL="serial console"/g' /etc/default/grub; echo "GRUB_SERIAL_COMMAND=\"serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1\"" >> /etc/default/grub;grub2-mkconfig -o /boot/grub2/grub.cfg;sysctl -w kernel.sysrq=1;echo kernel.sysrq=1 /etc/sysctl.conf;sysctl -a | grep -i sysrq
The file to modify is /boot/grub/grub.conf make the necessary modifications and you should have a configuration like this (the timeout=15 is used for demo purpose a reasonable value is 5)
#boot=/dev/vda default=0 timeout=15 splashimage=(hd0,0)/grub/splash.xpm.gz #hiddenmenu serial --unit=0 --speed=9600 terminal serial terminal --timeout=5 serial console
The last line, terminal --timeout=5 serial console will further increase GRUB timeout by adding a prompt for 5 seconds “Press any key to continue.” displayed in the console.
GRUB menu should appear on screen for the configured timeout=15 without the need to press Esc. Make sure to click in the Console in the Browser to make active the menu and select the required kernel
Force the kernel to a bash prompt
no password for root access in single user?
If you do not have root password and single user requires you to have a root password, you can boot the kernel replacing the init program with a bash prompt - this can be achieved by appending init=/bin/bash
Remount your / (root) file system RW using the command
mount -o remount,rw /
Now you can perform root password change or many other Linux configuration changes
Restart the VM with /sbin/reboot -f
Single User mode
Enter the desired mode by appending the keyword "single" or "1" to the line displaying "Ubuntu" on Red Hat systems you can also append rd.break
Ubuntu Recovery Mode
Additional recovery and clean up options are available for Ubuntu via GRUB however these are only are accessible if you configure kernel parameters accordingly.
Failure to configure this kernel boot parameter would force the Recovery menu to be sent to the Azure Diagnostics and not to the Azure Serial Console.
You can obtain access to the Ubuntu Recovery Menu by following these steps:
Interrupt the BOOT Process and access GRUB menu
Select the line displaying (recovery mode) do not press enter but press "e"
Locate the line that will load the kernel and substitute the last parameter "nomodeset" with destination as console=ttyS0
linux /boot/vmlinuz-4.15.0-1023-azure root=UUID=21b294f1-25bd-4265-9c4e-d6e4aeb57e97 ro recovery nomodeset
linux /boot/vmlinuz-4.15.0-1023-azure root=UUID=21b294f1-25bd-4265-9c4e-d6e4aeb57e97 ro recovery console=ttyS0
Press Ctrl-x to start and load the kernel
If all goes well you will see these additional Options which can help perform other recovery options