Why proactively ensuring you have access to GRUB and sysrq in your Linux VM could save you lots of down time?

Quite simply, to speed up the recovery time of your IaaS Linux Virtual Machine most recovery options will be possible if GRUB is configured.

Recovering Linux Virtual Machines in Azure

You will, at some point have to perform some type of recovery in your Linux VM unless you are very lucky, this inevitable.

The reasons for performing a VM recovery are numerous and can be anything such as:

• corrupt file systems/kernel/MBR
• failed kernel upgrades
• incorrect fstab configurations
• firewall configurations
• lost password
• mangled sshd configurations files
• Networking configurations
• Many others

You should take some time and verify if you can access GRUB on your VM's deployed in Azure. Official  Microsoft docs  

Ensure you take backups of files before making changes.

Watch this video to see how you can quickly recover you Linux VM once you have access to GRUB

There are a number of methods to aid recovery however in a Cloud environment this has been challenging and time is of the essence we understand you need to restore services at lightning speed.

With the introduction of the Azure Serial Console you have the power to interact with your Linux VM and manipulate many aspects including how the kernel will boot. The more experienced Linux/Unix sys admins will appreciate the single user and emergency modes that are accessible via the Azure Console making Disk Swap and VM deletion for many recovery scenarios redundant.

The method of recovery depends on the problem you are facing, for example a lost or misplaced password can be reset through Azure Portal options -> Reset Password . This communicates with the Linux Guest other possible tools such as Custom Script and other mechanisms are available however these options require that the Linux waagent be up and in a healthy state.

By ensuring that you have access to the Azure Serial Console and GRUB a password or an incorrect configuration change can take a matter of minutes instead of hours. You could even force the VM to boot from an alternative kernel should you have multiple kernels on disk in the scenario where your primary kernel becomes corrupt.

suggested order of recovery methods
• Azure Serial Console
• Disk Swap - can be automated using either:

Power Shell Recovery Scripts

bash Recovery Scripts

• Legacy Method

Challenge:
Not all Linux Azure VMs are configured by default for GRUB access and neither are they all configured to be interrupted with the sysrq commands. Some older distros such as SLES 11 Linux VM's are not configured to display Login prompt in the Azure Serial Console

In this current draft article I will review various distros and document findings on how to configure GRUB and what's the impact. (RH and SUSE to complete)

How to configure Linux VM to accept SysRq keys

Ubuntu GRUB configuration

Red Hat GRUB configuration

Force the kernel to a bash prompt

Single User mode

Ubuntu Recovery Mode

 

 

How to configure Linux VM to accept SysRq keys

The sysrq key is enabled on some newer Linux distros by default , whilst on other it could be configured for accepting  values only for certain SysRq features and on others it could be disabled completely especially in many older versions.

The SysRq feature is useful for rebooting a crashed or hung VM directly from the Azure Serial Console, I also find it extremely helpful in gaining access to the the GRUB menu, alternatively restarting a VM from another Portal window or ssh session might drop your current console connection thus expiring GRUB Timeouts to which are used to display the GRUB menu.

The VM must be configured to accept a value of 1 for the kernel parameter which enables all functions of sysrq or 128 which allows reboot/poweroff

[video width="1920" height="1080" mp4="https://msdnshared.blob.core.windows.net/media/2018/10/enable-sysrq-for-Azure-Serial-Console1.mp4"][/video]

 

To configure the VM to accept a reboot via SysRq commands on the Azure Portal you will need to set a value of 1 for the kernel parameter kernel.sysrq

For this configuration to persist a reboot it can be configured by adding entries to the file sysctl.conf  

echo kernel.sysrq=1 >> /etc/sysctl.conf

To configure the kernel parameter dynamically

sysctl -w kernel.sysrq=1

If you do not have root access or sudo is broken you can't enable SysRq from a shell prompt?

You can enable the sysrq  in such a scenario using the Portal. This can be beneficial if the sudoers.d/waagent file has become broken or an admin has deleted it.

Using the Azure Portal Operations -> Run Command -> RunShellScript feature, requires  the waagent process be healthy you can then inject this command to enable sysrq

 

 

 

sysctl -w kernel.sysrq=1 ; echo kernel.sysrq=1 >> /etc/sysctl.conf Once completed you can then try accessing SysRq and should see that a reboot is possible.

Select Reboot and Send SysRq Command

The system should log a reset message such as this

Ubuntu GRUB configuration

By default you should be able to access GRUB by holding down Esc key during the VM boot, however in testing , the GRUB menu was not always presented.
To force and keep the GRUB menu on screen in the Azure Serial Console you can test and use one of these 3 options.

Option 1

update this file /etc/default/grub.d/50-cloudimg-settings.cfg and it will keep the GRUB menu on screen for ever how long you specify.

GRUB_TIMEOUT=0

to

GRUB_TIMEOUT=5

Option 2
We can also achieve similar behavior by making changes to this file /etc/default/grub and observe 3 seconds to hit Esc

Comment these 2 lines:
#GRUB_HIDDEN_TIMEOUT=0
#GRUB_HIDDEN_TIMEOUT_QUIET=true

And add replace with

GRUB_TIMEOUT_STYLE=countdown

Option 3
This configuration gives you 5 seconds to hit Esc to acess GRUB and does perform a countdown to select the kernel

Update the file /etc/default/grub

GRUB_HIDDEN_TIMEOUT=5
#GRUB_HIDDEN_TIMEOUT_QUIET=true
#GRUB_TIMEOUT_STYLE=countdown
GRUB_TIMEOUT=10

For all these options you need to recreate GRUB configurations by running:

update-grub

 

Ubuntu 12.04

Ubuntu 12.04 will allow to see serial console but does not allow to interact. You will not see a login: prompt.

The settings required by each Ubuntu version for configuring serial console can be found here

Testing 12.04 was sufficient to

1) Create a file called /etc/init/ttyS0.conf containing the following:

 # ttyS0 - getty
#
# This service maintains a getty on ttyS0 from the point the system is
# started until it is shut down again.
start on stopped rc RUNLEVEL=[12345]
stop on runlevel [!12345]

respawn
exec /sbin/getty -L 115200 ttyS0 vt102

2) Ask upstart to start the getty

 sudo start ttyS0

Red Hat GRUB configuration

RHEL 7.4+

The default /etc/default/grub configuration on these versions is adequately configured

 GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial"
GRUB_CMDLINE_LINUX="console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300"
GRUB_DISABLE_RECOVERY="true"

Simply enable the SysRq key
sysctl -w kernel.sysrq=1;echo kernel.sysrq=1 >> /etc/sysctl.conf;sysctl -a | grep -i sysrq

RHEL 7.2+

The file to modify is /etc/default/grub - a default config looks like this:

 GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300"
GRUB_DISABLE_RECOVERY="true"

change the following lines in /etc/default/grub

GRUB_TIMEOUT=1

to

GRUB_TIMEOUT=5

GRUB_TERMINAL_OUTPUT="console"

to

GRUB_TERMINAL="serial console"

also add this line:

GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"

/etc/default/grub should now look similar to this:

 GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial console"
GRUB_CMDLINE_LINUX="console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300"
GRUB_DISABLE_RECOVERY="true"

Complete and update grub configuration using

grub2-mkconfig -o /boot/grub2/grub.cfg

set the SysRq kernel parameter:

sysctl -w kernel.sysrq=1;echo kernel.sysrq=1 >> /etc/sysctl.conf;sysctl -a | grep -i sysrq

For the more adventurous: I also configured GRUB and SysRq using a single line via the Run Command feature  it worked well but please backup your files before running this.

 cp /etc/default/grub /etc/default/grub.bak; sed -i 's/GRUB_TIMEOUT=1/GRUB_TIMEOUT=5/g' /etc/default/grub; sed -i 's/GRUB_TERMINAL_OUTPUT="console"/GRUB_TERMINAL="serial console"/g' /etc/default/grub; echo "GRUB_SERIAL_COMMAND=\"serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1\"" >> /etc/default/grub;grub2-mkconfig -o /boot/grub2/grub.cfg;sysctl -w kernel.sysrq=1;echo kernel.sysrq=1 /etc/sysctl.conf;sysctl -a | grep -i sysrq

RHEL 6.x

The file to modify is /boot/grub/grub.conf make the necessary modifications and you should have a configuration like this (the timeout=15 is used for demo purpose a reasonable value is 5)

 #boot=/dev/vda
default=0
timeout=15
splashimage=(hd0,0)/grub/splash.xpm.gz
#hiddenmenu
serial --unit=0 --speed=9600
terminal serial
terminal --timeout=5 serial console

The last line, terminal --timeout=5 serial console will further increase GRUB timeout by adding a prompt for 5 seconds “Press any key to continue.” displayed in the console.

GRUB menu should appear on screen for the configured timeout=15  without the need to press Esc. Make sure to click in the Console in the Browser to make active the menu and select the required kernel

Force the kernel to a bash prompt

no password for root access in single user?
If you do not have root password and single user requires you to have a root password, you can boot the kernel replacing the init program with a bash prompt - this can be achieved by appending init=/bin/bash

Remount your / (root) file system RW using the command
mount -o remount,rw /

Now you can perform root password change or many other Linux configuration changes

Restart the VM with /sbin/reboot -f

Single User mode

Alternatively you might need to access the VM in single user or emergency mode Select the kernel you wish to boot or interrupt using arrow keys

Enter the desired mode by appending the keyword "single" or "1" to the line displaying "Ubuntu" on Red Hat systems you can also append rd.break

 

Ubuntu Recovery Mode

Additional recovery and clean up options are available for Ubuntu via GRUB however these are only are accessible if you configure kernel parameters accordingly.

Failure to configure this kernel boot parameter would force the Recovery menu to be sent to the Azure Diagnostics and not to the Azure Serial Console.

You can obtain access to the Ubuntu Recovery Menu by following these steps:

Interrupt the BOOT Process and access GRUB menu

Select Advanced Options for Ubuntu

Select the line displaying (recovery mode) do not press enter but press "e"

 

Locate the line that will load the kernel and substitute the last parameter "nomodeset" with destination as console=ttyS0

linux       /boot/vmlinuz-4.15.0-1023-azure root=UUID=21b294f1-25bd-4265-9c4e-d6e4aeb57e97 ro recovery nomodeset

to

linux       /boot/vmlinuz-4.15.0-1023-azure root=UUID=21b294f1-25bd-4265-9c4e-d6e4aeb57e97 ro recovery console=ttyS0

 

Press Ctrl-x to start and load the kernel

If all goes well you will see these additional Options which can help perform other recovery options