Replace a scale unit node on an Azure Stack integrated system
Applies to: Azure Stack integrated systems
This article describes the general process to replace a physical computer (also referred to as a scale unit node) on an Azure Stack integrated system. Actual scale unit node replacement steps will vary based on your original equipment manufacturer (OEM) hardware vendor. See your vendor's field replaceable unit (FRU) documentation for detailed steps that are specific to your system.
Firmware leveling is critical for the success of the operation described in this article. Missing this step can lead to system instability, performance decrease, security threads or prevent Azure Stack automation to deploy the operating system. Always consult your hardware partner's documentation when replacing hardware to ensure applied firmware is matching the OEM Version displayed in the Azure Stack administrator portal.
The following flow diagram shows the general FRU process to replace an entire scale unit node.
*This action may not be required based on the physical condition of the hardware.
If the shutdown operation does fail, it is recommended to use the drain operation followed by the stop operation. For more details see available node operations
Review alert information
If a scale unit node is down, you'll receive the following critical alerts:
- Node not connected to network controller
- Node inaccessible for virtual machine placement
- Scale unit node is offline
If you open the Scale unit node is offline alert, the alert description contains the scale unit node that's inaccessible. You may also receive additional alerts in the OEM-specific monitoring solution that's running on the hardware lifecycle host.
Scale unit node replacement process
The following steps are provided as a high-level overview of the scale unit node replacement process. See your OEM hardware vendor's FRU documentation for detailed steps that are specific to your system. Do not follow these steps without referring to your OEM-provided documentation.
Use the Shutdown action to gracefully shutdown the scale unit node. This action may not be required based on the physical condition of the hardware.
In the unlikely case the shutdown action fails, use the Drain action to put the scale unit node into maintenance mode. This action may not be required based on the physical condition of the hardware.
In any case, only one node can be disabled and powered off at the same time without breaking the S2D (Storage Spaces Direct).
After the scale unit node is in maintenance mode, use the Stop action. This action may not be required based on the physical condition of the hardware.
In the unlikely case that the Power off action doesn't work, use the baseboard management controller (BMC) web interface instead.
Replace the physical computer. Typically, this is done by your OEM hardware vendor.
Use the Repair action to add the new physical computer to the scale unit.
Use the privileged endpoint to check the status of virtual disk repair. With new data drives, a full storage repair job can take multiple hours depending on system load and consumed space.
After the repair action has finished, validate that all active alerts have been automatically closed.