Linux Recovery: Using CHROOT steps to recover VMs that are not accessible
You are facing an issue where you might need to basically delete a virtual machine (VM) , keep its OSDisk and attach it to another Linux VM where you can then repair whatever is wrong on that old VM, however in some instances you would also be able to have network access and some way of updating let's say packages, adding a user account or doing any task that you can do on a normal running VM.
There is a process called chroot (for a reference you can check https://en.wikipedia.org/wiki/Chroot ) where you can basically get all of that working and the steps are mostly the same across Linux distributions.
Here in this post we will describe the steps which are very useful on Azure since it allows you , to name a few:
- Add/Remove users
- Change user accounts
- Reinstall system packages such as Kernel images, SSH and other components that might be causing the VM to not boot up successfully
- Edit files and test to see if related services are working properly after the changes
All these tasks sometimes are required to restore a VM that was not booting up properly and you need to repair its original VHD and re-deploy.
The steps below assume that you have deleted the old VM and kept its OSDisk and you also have deployed a new recovery VM using the same Linux distribution and version as the inaccessible VM.
a) We highly recommend making a backup of the VHD from the inaccessible VM before going through the steps for the recovery process, you can make a backup of the VHD by using Microsoft Storage Explorer, available at http://storageexplorer.com
b) For classic VM's you will be prompted to keep the OSDisk when deleting the VM and for Resource Manager VM's the OSDisk is kept by default.
Steps for the CHROOT process
Make sure you attach the old OSDisk (VHD) from the previous to the new VM as you would normally to for a datadisk using the portal then on the recovery VM we will mount and execute the commands described below.
NOTE: A few notes and exceptions are also documented at the end of this article around a few minor diferences between Linux distributions, so please make sure you have a look at them as well since they might impact the steps provided below.
To make sure we don't need to use sudo for all commands and to make it simpler, let's switch to root access:
To list disks and partitions available:
In most scenarios you will see the old VHD attached as /dev/sdc and the steps below assume you have a standard deployment which only one main partition as the OSDisk, so it will be /dev/sdc1, we have seen other distributions where you might have two partitions setup (one being /boot and the other /) so if that is the case, make sure you mount that accordingly, for example if /dev/sdc1 is boot and /dev/sdc2 is / you would first mount the / partition under /rescue and then the boot under /rescue/boot:
mount /dev/sdc2 /rescue
mount /dev/sdc1 /rescue/boot
NOTE: The other details will also apply, meaning, if it's an XFS partition you might need to use the -o nouuid and so on. Later in this article we have an example related to Red Hat 7.2 that usually has 2 partitions setup.
Create a rescue folder and mount the old VHD in it (In this case we are assuming the disk/partition to be /dev/sdc1 as explained above).
For Red Hat 7.2+
mount -o nouuid /dev/sdc2 /rescue
For CentOS 7.2+
mount -o nouuid /dev/sdc1 /rescue
For Debian 8.2+, Ubuntu 16.04+, SUSE 12 SP4+
mount /dev/sdc1 /rescue
mount -t proc proc proc
mount -t sysfs sys sys/
mount -o bind /dev dev/
mount -o bind /dev/pts dev/pts/
For Debian and Ubuntu distributions you will also mount (run):
mount -o bind /run run/
After executing the chroot command below we will basically be running out of the previous OSDisk, we can then run commands, update software and everything that we need to do in order to fix the previous errors with the VM.
Change to root environment
At this point you can repair anything that you need in the disk as you would normally do in a working VM and the changes will affect the disk, so you can run package management commands to remove install software, edit files and so on.
After you are done repairing the issues then you can proceed to umount the disk so you can rebuild the VM from it.
The steps to exit the chroot environment and umount everything are:
NOTE: If you were troubleshooting a disk with 2 partitions as stated above, make sure you also umount /rescue/boot before /rescue.
For Debian and Ubuntu distributions you will need to umount (run):
To rebuild the VM we recommend you the articles we have using either PowerShell or Azure CLI if this is a Resource Manager VM:
Azure PowerShell: How to delete and re-deploy a VM from VHD Azure CLI: How to delete and re-deploy a VM from VHD
Notes about Linux distributions:
RedHat Red Hat Enterprise Linux Server release 7.2
1) Has two disks : Boot and and the root file system
2) xfs file system so we have to use:
mount -o nouuid /dev/sdc2 /rescue
3) Mount the /boot partition if issues involve kernel repair using yum update or if you need to inspect the /boot partition content in the VHD:
mount -o nouuid /dev/sdc1 /rescue/boot/