讓 Azure Stack HCI 伺服器離線進行維護Taking an Azure Stack HCI server offline for maintenance

適用于: Azure Stack HCI、版本 20H2;Windows Server 2019Applies to: Azure Stack HCI, version 20H2; Windows Server 2019

使用 Azure Stack HCI 時,讓伺服器離線進行維護,需要在叢集中的所有伺服器之間共用離線的部分儲存體。With Azure Stack HCI, taking a server offline for maintenance requires taking portions of storage offline that are shared across all servers in the cluster. 這需要暫停您要離線的伺服器、將角色和虛擬機器 (Vm) 至叢集中的其他伺服器,並確認叢集中的其他伺服器上有可用的所有資料。This requires pausing the server that you want to take offline, moving roles and virtual machines (VMs) to other servers in the cluster, and verifying that all data is available on the other servers in the cluster. 此程式可確保資料在維護期間都保持安全且可供存取。This process ensures that the data remains safe and accessible throughout the maintenance period.

您可以使用 Windows Admin Center 或 PowerShell,讓伺服器離線進行維護。You can either use either Windows Admin Center or PowerShell to take a server offline for maintenance. 本主題將討論這兩種方法。This topic covers both methods.

重要

本主題假設您需要關閉實體伺服器的電源來執行維護,或基於其他原因重新開機它。This topic assumes that you need to power down a physical server down to perform maintenance, or restart it for some other reason. 若要在 Azure Stack HCI 叢集上安裝更新,請參閱 更新 Azure Stack HCI叢集,其中說明如何使用 Cluster-Aware 更新 (CAU) ,以自動執行本主題中的所有步驟,同時也會補救伺服器並視需要重新開機。To install updates on an Azure Stack HCI cluster, see Update Azure Stack HCI clusters, which explains how to use Cluster-Aware Updating (CAU) to automatically perform all the steps in this topic while also updating servers and restarting them if necessary.

使用 Windows Admin Center 讓伺服器離線Take a server offline using Windows Admin Center

準備將伺服器離線 Azure Stack HCI 叢集最簡單的方式,就是使用 Windows Admin Center。The simplest way to prepare to take a server in an Azure Stack HCI cluster offline is by using Windows Admin Center.

確認可以安全地讓伺服器離線Verify it's safe to take the server offline

  1. 使用 Windows Admin Center,連接到您想要離線使用的伺服器。Using Windows Admin Center, connect to the server you want to take offline. 從 [ 工具 ] 功能表中選取 [ 存放裝置 > 磁片 ],並確認每個虛擬磁片的 [ 狀態 ] 欄顯示 [ 線上 ]。Select Storage > Disks from the Tools menu, and verify that the Status column for every virtual disk shows Online .

  2. 然後,選取 儲存體 > 磁片區 ,並確認每個磁片區的 健全狀況 資料行顯示狀況 良好 ,且每個磁片區的 [ 狀態 ] 欄顯示 [確定]Then, select Storage > Volumes and verify that the Health column for every volume shows Healthy and that the Status column for every volume shows OK .

暫停和清空伺服器Pause and drain the server

在關閉或重新開機伺服器之前,您應該暫停伺服器,並清空 (移) 任何叢集角色(例如在其上執行的 Vm),以確保伺服器關機不會影響應用程式狀態。Before either shutting down or restarting a server, you should pause the server and drain (move off) any clustered roles such as VMs running on it to ensure that the server shutdown does not affect application state. 將叢集伺服器離線進行維護之前,請先暫停並清空叢集伺服器。Always pause and drain clustered servers before taking them offline for maintenance.

  1. 使用 Windows Admin Center 連接到叢集,然後從 [叢集管理員] 中的 [ 工具 ] 功能表選取 [ 計算 > 節點 ]。Using Windows Admin Center, connect to the cluster and then select Compute > Nodes from the Tools menu in Cluster Manager.

  2. 按一下您想要暫停和清空的伺服器名稱,然後選取 [ 暫停 ]。Click on the name of the server you wish to pause and drain, and select Pause . 您應該會看到下列提示:You should see the following prompt:

    如果您暫停此節點,則所有叢集角色都會移至其他節點,而且在繼續之前,不能將任何角色新增至此節點。您確定要暫停叢集節點嗎?If you pause this node, all clustered roles move to other nodes and no roles can be added to this node until it's resumed. Are you sure you want to pause cluster node?

  3. 選取 [是] 以暫停伺服器並起始清空程式。Select yes to pause the server and initiate the drain process. 叢集節點狀態將會顯示為 清空 ,而 Hyper-v 和 vm 之類的角色將立即開始即時移轉至叢集中的其他伺服器 (s) 。The cluster node status will show as Draining , and roles such as Hyper-V and VMs will immediately begin live migrating to other server(s) in the cluster. 這可能需要數分鐘的時間。This can take a few minutes.

    注意

    當您正確地暫停和清空伺服器時,Azure Stack HCI 會執行自動安全檢查,以確保能夠安全地繼續進行。When you pause and drain the server properly, Azure Stack HCI performs an automatic safety check to ensure it is safe to proceed. 如果磁碟區狀況不良,它將會停止,並提醒您繼續執行並不安全。If there are unhealthy volumes, it will stop and alert you that it's not safe to proceed.

關閉伺服器Shut down the server

一旦伺服器完成清空,在 Windows Admin Center 中,它的狀態會顯示為 [已 暫停 ]。Once the server has completed draining, it status will show as Paused in Windows Admin Center. 您現在可以安全地關閉伺服器以進行維護,或將它重新開機。You can now safely shut the server down for maintenance or reboot it.

繼續伺服器Resume the server

當您準備好讓伺服器再次開始裝載叢集角色和 Vm 時,只要開啟伺服器、等候它開機,然後使用下列步驟繼續伺服器即可。When you are ready for the server to begin hosting clustered roles and VMs again, simply turn the server on, wait for it to boot up, and resume the server using the following steps.

  1. 在 [叢集管理員] 中,從左側的 [ 工具 ] 功能表中選取 [ 計算 > 節點In Cluster Manager, select Compute > Nodes from the Tools menu at the left.

  2. 選取您要恢復的伺服器名稱,然後按一下 [ 繼續 ]。Select the name of the server you wish to resume, and then click Resume . 您應該會看到下列提示:You should see the following prompt:

    確定要繼續叢集節點嗎?Are you sure you want to resume cluster node?

  3. 在大多數情況下,您應該選取 [將叢集 角色轉移回此節點 ] 核取方塊。In most situations, you should select the checkbox that says Transfer clustered roles back to this node . 選取 [是] 以繼續伺服器。Select yes to resume the server.

如果您核取上述步驟3中的方塊,叢集角色和 Vm 將立即開始即時移轉至伺服器。If you checked the box in step 3 above, clustered roles and VMs will immediately begin live migrating back to the server. 這可能需要數分鐘的時間。This can take a few minutes.

等候儲存體重新同步Wait for storage to resync

當伺服器繼續時,任何在無法使用時所發生的新寫入都需要重新同步。When the server resumes, any new writes that happened while it was unavailable need to resync. 這會使用智慧型變更追蹤自動進行。This happens automatically, using intelligent change tracking. 所有資料都 不需要掃描或同步處理;只有變更。It's not necessary for all data to be scanned or synchronized; only the changes. 此程序會進行調整以減少對於生產工作負載造成的影響。This process is throttled to mitigate impact to production workloads. 根據伺服器暫停的時間以及寫入的新資料量而定,可能需要幾分鐘的時間才能完成。Depending on how long the server was paused and how much new data was written, it may take many minutes to complete.

重要

您必須等候重新同步完成,才能讓叢集中的任何其他伺服器離線。You must wait for re-syncing to complete before taking any other servers in the cluster offline.

若要檢查正在重新同步是否已完成,請使用 Windows Admin Center 連接到伺服器,然後從左側的 [ 工具 ] 功能表選取 [ 存放裝置 > 磁片區 ],然後選取靠近頁面頂端的 磁片 區。To check if resyncing has completed, connect to the server using Windows Admin Center and select Storage > Volumes from the Tools menu at the left, then select Volumes near the top of the page. 如果每個磁片區的 健全狀況 資料行顯示狀況 良好 ,且每個磁片區的 [ 狀態 ] 欄顯示 [確定] ,表示重新同步處理已完成,而且現在可安全地讓叢集中的其他伺服器離線。If the Health column for every volume shows Healthy and the Status column for every volume shows OK , then re-syncing has completed, and it's now safe to take other servers in the cluster offline.

使用 PowerShell 讓伺服器離線Take a server offline using PowerShell

使用下列程式,在使用 PowerShell 的 Azure Stack HCI 叢集中正確地暫停、清空和繼續伺服器。Use the following procedures to properly pause, drain, and resume a server in an Azure Stack HCI cluster using PowerShell.

確認可以安全地讓伺服器離線Verify it's safe to take the server offline

若要確認您的所有磁片區都狀況良好,請以系統管理員身分執行下列 Cmdlet:To verify that all your volumes are healthy, run the following cmdlet as an administrator:

Get-VirtualDisk

此輸出可能看起來會像以下的範例:Here's an example of what the output might look like:

FriendlyName              ResiliencySettingName FaultDomainRedundancy OperationalStatus HealthStatus    Size FootprintOnPool StorageEfficiency
------------              --------------------- --------------------- ----------------- ------------    ---- --------------- -----------------
Mirror II                 Mirror                1                     OK                Healthy         4 TB         8.01 TB            49.99%
Mirror-accelerated parity                                             OK                Healthy      1002 GB         1.96 TB            49.98%
Mirror                    Mirror                1                     OK                Healthy         1 TB            2 TB            49.98%
ClusterPerformanceHistory Mirror                1                     OK                Healthy        24 GB           49 GB            48.98%

確認每個磁片區的 HealthStatus 屬性都 狀況良好 ,且 OperationalStatus 顯示 [確定]。Verify that the HealthStatus property for every volume is Healthy and the OperationalStatus shows OK.

暫停和清空伺服器Pause and drain the server

以系統管理員身分執行下列 Cmdlet,以暫停和清空伺服器:Run the following cmdlet as an administrator to pause and drain the server:

Suspend-ClusterNode -Drain

關閉伺服器Shut down the server

一旦伺服器完成清空,它就會在 PowerShell 中顯示為已 暫停Once the server has completed draining, it will show as Paused in PowerShell.

您現在可以使用 Stop-Computer 或 PowerShell Cmdlet,安全地關閉伺服器或重新開機伺服器 Restart-ComputerYou can now safely shut the server down or restart it by using the Stop-Computer or Restart-Computer PowerShell cmdlets.

注意

Get-VirtualDisk 關閉或啟動/停止叢集服務的伺服器上執行命令時,伺服器的操作狀態可能會報告為 [不完整] 或 [降級],而 [健全狀況狀態] 欄可能會列出警告。When running a Get-VirtualDisk command on servers that are shutting down or starting/stopping the cluster service, the server's Operational Status may be reported as incomplete or degraded, and the Health Status column may list a warning. 這是正常現象,因此應該不會造成問題。This is normal and should not cause concern. 所有的磁碟區仍保持連線且可存取。All your volumes remain online and accessible.

繼續伺服器Resume the server

以系統管理員身分執行下列 Cmdlet,以將伺服器恢復到叢集中。Run the following cmdlet as an administrator to resume the server into the cluster. 若要傳回先前在伺服器上執行的叢集角色和 Vm,請使用選擇性 容錯回復 旗標:To return the clustered roles and VMs that were previously running on the server, use the optional -Failback flag:

Resume-ClusterNode –Failback Immediate

一旦伺服器繼續,它就會 顯示在 PowerShell 中。Once the server has resumed, it will show as Up in PowerShell.

等候儲存體重新同步Wait for storage to resync

當伺服器繼續時,您必須等候重新同步完成,才能讓叢集中的任何其他伺服器離線。When the server resumes, you must wait for re-syncing to complete before taking any other servers in the cluster offline.

以系統管理員身分執行下列 Cmdlet 來監視進度:Run the following cmdlet as administrator to monitor progress:

Get-StorageJob

如果重新同步處理已完成,您將不會取得任何輸出。If re-syncing has already completed, you won't get any output.

以下是一些範例輸出,顯示仍在執行中的重新同步 (修復) 作業:Here's some example output showing resync (repair) jobs still running:

Name   IsBackgroundTask ElapsedTime JobState  PercentComplete BytesProcessed BytesTotal
----   ---------------- ----------- --------  --------------- -------------- ----------
Repair True             00:06:23    Running   65              11477975040    17448304640
Repair True             00:06:40    Running   66              15987900416    23890755584
Repair True             00:06:52    Running   68              20104802841    22104819713

BytesTotal 資料行會顯示需要重新同步處理的儲存體數量。The BytesTotal column shows how much storage needs to resync. [排到 百分比 ] 資料行會顯示進度。The PercentComplete column displays progress.

警告

請務必等候這些修復工作完成後再將另一部伺服器離線。It's not safe to take another server offline until these repair jobs finish.

在這段期間, HealthStatus 下,您的磁片區將會繼續顯示為「 警告 」,這是正常的。During this time, under HealthStatus , your volumes will continue to show as Warning , which is normal.

例如,如果您在 Get-VirtualDisk 儲存體重新同步時使用 Cmdlet,您可能會看到下列輸出:For example, if you use the Get-VirtualDisk cmdlet while storage is re-syncing, you might see the following output:

FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach Size
------------ --------------------- ----------------- ------------ -------------- ----
MyVolume1    Mirror                InService         Warning      True           1 TB
MyVolume2    Mirror                InService         Warning      True           1 TB
MyVolume3    Mirror                InService         Warning      True           1 TB

一旦工作完成,請使用 Get-VirtualDisk Cmdlet 以再次確認磁碟區顯示為 [Healthy]Once the jobs complete, verify that volumes show Healthy again by using the Get-VirtualDisk cmdlet. 以下是一些輸出範例︰Here's some example output:

FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach Size
------------ --------------------- ----------------- ------------ -------------- ----
MyVolume1    Mirror                OK                Healthy      True           1 TB
MyVolume2    Mirror                OK                Healthy      True           1 TB
MyVolume3    Mirror                OK                Healthy      True           1 TB

您現在可以放心地暫停並重新啟動叢集中的其他伺服器。It's now safe to pause and restart other servers in the cluster.

後續步驟Next steps

如需相關資訊,另請參閱:For related information, see also: