在 Azure Stack Hub 中監視健康情況和警示Monitor health and alerts in Azure Stack Hub

Azure Stack Hub 包含基礎結構監視功能,可協助您檢視 Azure Stack Hub 區域的健康情況和警示。Azure Stack Hub includes infrastructure monitoring capabilities that help you view health and alerts for an Azure Stack Hub region. [區域管理] 圖格會列出 Azure Stack Hub 的所有已部署區域。The Region management tile lists all the deployed regions of Azure Stack Hub. 根據預設,其會釘選在預設提供者訂用帳戶的管理員入口網站中。It's pinned by default in the administrator portal for the Default Provider Subscription. 此圖格會顯示每個區域的作用中重要和警告警示數目。The tile shows the number of active critical and warning alerts for each region. 此圖格是您進入 Azure Stack Hub 健康情況和警示功能的進入點。The tile is your entry point into the health and alert functionality of Azure Stack Hub.

Azure Stack Hub 管理員入口網站中的區域管理圖格

了解 Azure Stack Hub 中的健康情況Understand health in Azure Stack Hub

健康情況資源提供者會管理健康情況和警示。The health resource provider manages health and alerts. 在 Azure Stack Hub 部署和設定期間,Azure Stack Hub 基礎結構元件會向健康情況資源提供者註冊。Azure Stack Hub infrastructure components register with the health resource provider during Azure Stack Hub deployment and configuration. 這項註冊讓每個元件的健康情況和警示得以顯示。This registration enables the display of health and alerts for each component. 在 Azure Stack Hub 中健康情況是個簡單的概念。Health in Azure Stack Hub is a simple concept. 如果元件的已註冊執行個體有警示存在,該元件的健康情況狀態會反映出最差的作用中警示嚴重性:警告或重要。If alerts for a registered instance of a component exist, the health state of that component reflects the worst active alert severity: warning or critical.

警示嚴重性定義Alert severity definition

Azure Stack Hub 只會以兩種嚴重性引發警示:警告重大Azure Stack Hub raises alerts with only two severities: warning and critical.

  • 警告Warning
    操作員可用排程方式來解決警告警示。An operator can address the warning alert in a scheduled manner. 警示通常不會影響使用者工作負載。The alert typically doesn't impact user workloads.

  • 嚴重Critical
    操作員應該緊急解決嚴重警示。An operator should address the critical alert with urgency. 這些警示是目前影響或即將影響 Azure Stack Hub 使用者的問題。These alerts indicate issues that currently impact or will soon impact Azure Stack Hub users.

檢視和管理元件健康情況狀態View and manage component health state

您可在系統管理員入口網站上,也可以透過 REST API 和 PowerShell,來檢視元件的健康情況狀態。You can view the health state of components in the administrator portal and through REST API and PowerShell.

若要在入口網站中檢視健康情況狀態,請在 [區域管理] 磚中按一下您想要檢視的區域。To view the health state in the portal, click the region that you want to view in the Region management tile. 您可以檢視基礎結構角色和資源提供者的健康情況狀態。You can view the health state of infrastructure roles and of resource providers.

基礎結構角色的清單

您可以按一下資源提供者或基礎結構角色,以檢視更詳細的資訊。You can click a resource provider or infrastructure role to view more detailed information.

警告

如果您按一下基礎結構角色,然後按一下角色執行個體,就會看到 [啟動] 、[重新啟動] 或 [關機] 選項。If you click an infrastructure role, and then click the role instance, there are options to Start, Restart, or Shutdown. 當您對整合式系統套用更新時,請勿使用這些動作。Don't use these actions when you apply updates to an integrated system. 此外,也 請勿 在 Azure Stack 開發套件 (ASDK) 環境中使用這些選項。Also, do not use these options in an Azure Stack Development Kit (ASDK) environment. 這些選項是僅針對每一基礎結構角色有多個角色執行個體的整合式系統環境而設計的。These options are only designed for an integrated systems environment, where there's more than one role instance per infrastructure role. 重新啟動 ASDK 中的角色執行個體 (特別是 AzS-Xrp01) 會導致系統不穩定。Restarting a role instance (especially AzS-Xrp01) in the ASDK causes system instability. 如需疑難排解協助,請將您的問題張貼到 Azure Stack Hub 論壇For troubleshooting assistance, post your issue to the Azure Stack Hub forum.

檢視警示View alerts

可直接從 [區域管理] 刀鋒視窗檢視每個 Azure Stack Hub 區域的作用中警示清單。The list of active alerts for each Azure Stack Hub region is available directly from the Region management blade. 預設設定下的第一個磚是 [警示] 磚,其中會顯示該區域的重要和警告警示摘要。The first tile in the default configuration is the Alerts tile, which displays a summary of the critical and warning alerts for the region. 就像這個刀鋒視窗中的其他磚一樣,您可以將 [警示] 磚釘選到儀表板上以便快速存取。You can pin the Alerts tile, like any other tile on this blade, to the dashboard for quick access.

在 Azure Stack Hub 管理員入口網站中顯示警告的 [警示] 圖格

若要檢視區域的所有作用中警示清單,請選取 [警示] 圖格的上半部。To view a list of all active alerts for the region, select the top part of the Alerts tile. 若要檢視已篩選的警示清單 (重大或警告),請選取圖格內的 [重大] 或 [警告] 明細項目。To view a filtered list of alerts (Critical or Warning), select either the Critical or Warning line item within the tile.

[警示] 刀鋒視窗支援依狀態 (作用中或已關閉) 和嚴重性 (重要或警告) 進行篩選。The Alerts blade supports the ability to filter both on status (Active or Closed) and severity (Critical or Warning). 預設檢視會顯示所有作用中的警示。The default view displays all active alerts. 所有已關閉的警示會在七天後從系統中移除。All closed alerts are removed from the system after seven days.

注意

如果警示保持在作用中狀態,但已超過一天未更新,您可以執行 Test-AzureStack,並在未回報任何問題時關閉警示。If an alert remains active but hasn't been updated in over a day, you can run Test-AzureStack and close the alert if no problems are reported.

[篩選] 窗格可在 Azure Stack Hub 管理員入口網站中依重大或警告狀態進行篩選

[檢視 API] 動作會顯示用來產生清單檢視的 REST API。The View API action displays the REST API that was used to generate the list view. 這個動作可讓您快速熟悉可用來查詢警示的 REST API 語法。This action provides a quick way to become familiar with the REST API syntax that you can use to query alerts. 您可用自動化方式使用此 API,或與您現有的資料中心監視、報告及票證解決方案整合。You can use this API in automation or for integration with your existing datacenter monitoring, reporting, and ticketing solutions.

您可以按一下特定警示來檢視警示詳細資料。You can click a specific alert to view the alert details. 警示詳細資料會顯示與警示相關的所有欄位,並可讓您快速瀏覽至受影響的元件和警示來源。The alert details show all fields that are associated with the alert and enable quick navigation to the affected component and source of the alert. 例如,如果其中一個基礎結構角色執行個體離線或無法存取,就會發生以下警示。For example, the following alert occurs if one of the infrastructure role instances goes offline or isn't accessible.

Azure Stack Hub 管理員入口網站中的 [警示詳細資料] 刀鋒視窗

警示補救Alert remediation

建議的補救方式Automated remediation

有些警示支援 修復 選項,如上圖所示。Some alerts support a Repair option, as shown in the previous image. 已選取時,修復 動作會執行警示特定的步驟,以嘗試解決問題。When selected, the Repair action performs steps specific to the alert to attempt to resolve the issue. 選取之後,[修復] 動作的狀態就會顯示為入口網站通知。Once selected, the status of the Repair action is available as a portal notification.

修復警示動作進行中

修復 動作會回報成功完成或失敗,以完成相同入口網站通知刀鋒視窗中的動作。The Repair action will report successful completion or failure to complete the action in the same portal notification blade. 如果警示的「修復」動作失敗,您可以從警示詳細資料中重新執行 修復 動作。If a Repair action fails for an alert, you may rerun the Repair action from the alert detail. 如果警示的「修復」動作成功完成,請 不要 重新執行 修復 動作。If the Repair action successfully completes, do not rerun the Repair action. 基礎結構角色執行個體回到線上之後,會自動關閉此警示。After the infrastructure role instance is back online, this alert automatically closes.

修復動作成功完成

手動補救Manual remediation

如果不支援 修復 選項,請務必遵循警示中提供的一組完整補救指示。If the Repair option is not supported, be sure to follow the complete set of remediation instructions provided in the alert. 例如,內部憑證到期補救步驟將引導您完成秘密輪替的程式:As an example, the internal certificate expiration remediation steps will guide you through the process of secret rotation:

憑證到期補救

警示關閉Alert closure

當基礎問題解決時,許多(但不是每個)警示都會自動關閉。Many, but not every alert, will automatically close when the underlying issue is resolved. 警示如果有提供 [修復] 動作按鈕,當 Azure Stack Hub 解決問題時,將會自動關閉。Alerts that provide a Repair action button will close automatically if Azure Stack Hub resolves the issue. 針對所有其他警示,在您執行補救步驟之後,請選取 [關閉警示]。For all other alerts, select Close Alert after you do the remediation steps. 如果問題仍然存在,Azure Stack Hub 會產生新警示。If the issue persists, Azure Stack Hub generates a new alert. 如果您將問題解決,警示就會保持關閉,而無須進行任何其他步驟。If you resolve the issue, the alert remains closed and requires no more steps.

後續步驟Next steps

在 Azure Stack Hub 中管理更新Manage updates in Azure Stack Hub

Azure Stack Hub 中的區域管理Region management in Azure Stack Hub