Windows Server 中的运行状况服务Health Service in Windows Server

适用于:Windows Server 2019、Windows Server 2016Applies to: Windows Server 2019, Windows Server 2016

运行状况服务是 Windows Server 2016 中的一项新功能,可改进运行存储空间直通的群集的日常监视和操作体验。The Health Service is a new feature in Windows Server 2016 that improves the day-to-day monitoring and operational experience for clusters running Storage Spaces Direct.

先决条件Prerequisites

默认情况下,存储空间直通启用运行状况服务。The Health Service is enabled by default with Storage Spaces Direct. 设置或启动它时无需执行任何其他操作。No additional action is required to set it up or start it. 若要了解有关存储空间直通的详细信息,请参阅Windows Server 2016 中的存储空间直通To learn more about Storage Spaces Direct, see Storage Spaces Direct in Windows Server 2016.

报告Reports

请参阅运行状况服务报表See Health Service reports.

故障Faults

请参阅运行状况服务错误See Health Service faults.

操作Actions

请参阅运行状况服务操作See Health Service actions.

自动化Automation

下一部分介绍磁盘生命周期中运行状况服务自动化的工作流。This section describes workflows which are automated by the Health Service in the disk lifecycle.

磁盘生命周期Disk Lifecycle

运行状况服务自动执行物理磁盘生命周期的大多数阶段。The Health Service automates most stages of the physical disk lifecycle. 假设部署的初始状态处于最佳运行状况 - 也就是说,所有物理磁盘正常运行。Let's say that the initial state of your deployment is in perfect health - which is to say, all physical disks are working properly.

停用Retirement

物理磁盘不再可用且引发相应故障时,将自动停用。Physical disks are automatically retired when they can no longer be used, and a corresponding Fault is raised. 有以下几种情况:There are several cases:

  • 介质故障:物理磁盘彻底失败或损坏,并且必须进行替换。Media Failure: the physical disk is definitively failed or broken, and must be replaced.

  • 通信中断:物理磁盘连接断开的持续时间超过 15 分钟。Lost Communication: the physical disk has lost connectivity for over 15 consecutive minutes.

  • 无响应:物理磁盘在一小时内出现三次或更多次时间超过 5.0 秒的延迟。Unresponsive: the physical disk has exhibited latency of over 5.0 seconds three or more times within an hour.

备注

如果许多物理磁盘一次性断开连接或整个节点或存储机箱断开连接,运行状况服务将停用这些磁盘,因为它们不太可能是根本问题。If connectivity is lost to many physical disks at once, or to an entire node or storage enclosure, the Health Service will not retire these disks since they are unlikely to be the root problem.

如果已停用的磁盘曾用作许多其他物理磁盘的缓存,则将自动重新分配到另一个缓存磁盘(如果存在)。If the retired disk was serving as the cache for many other physical disks, these will automatically be reassigned to another cache disk if one is available. 无需特定的用户操作。No special user action is required.

还原复原能力Restoring resiliency

一旦停用物理磁盘,运行状况服务会立即开始将其数据复制到其余物理磁盘来还原完全复原能力。Once a physical disk has been retired, the Health Service immediately begins copying its data onto the remaining physical disks, to restore full resiliency. 完成后,数据是完全安全的并重新具有容错能力。Once this has completed, the data is completely safe and fault tolerant anew.

备注

此立即还原操作要求剩余的物理磁盘之间具有足够的可用容量。This immediate restoration requires sufficient available capacity among the remaining physical disks.

闪烁的指示灯Blinking the indicator light

如果可能,运行状况服务将开始在停用的物理磁盘或其插槽上闪烁指示灯。If possible, the Health Service will begin blinking the indicator light on the retired physical disk or its slot. 这将无限期继续下去,直到更换已停用的磁盘。This will continue indefinitely, until the retired disk is replaced.

备注

在某些情况下,磁盘甚至可能出现阻止指示灯正常运行的故障 - 例如,完全断电。In some cases, the disk may have failed in a way that precludes even its indicator light from functioning - for example, a total loss of power.

物理替换Physical replacement

应尽可能替换已停用的物理磁盘。You should replace the retired physical disk when possible. 最常见的情况是,这种情况下,不需要关闭节点或存储机箱。Most often, this consists of a hot-swap - i.e. powering off the node or storage enclosure is not required. 查看故障了解有用的位置和部件信息。See the Fault for helpful location and part information.

验证Verification

插入替换磁盘后,将根据支持的组件文档对其进行验证(请参阅下一节)。When the replacement disk is inserted, it will be verified against the Supported Components Document (see the next section).

Pooling

如果允许,替代磁盘将被自动替换到其前身池中以进行使用。If allowed, the replacement disk is automatically substituted into its predecessor's pool to enter use. 此时,系统会恢复到处于最佳运行状况的初始状态,故障消失。At this point, the system is returned to its initial state of perfect health, and then the Fault disappears.

支持的组件文档Supported Components Document

运行状况服务提供了一种强制机制,用于将存储空间直通所使用的组件限制到管理员或解决方案供应商提供的支持的组件文档中。The Health Service provides an enforcement mechanism to restrict the components used by Storage Spaces Direct to those on a Supported Components Document provided by the administrator or solution vendor. 这可用来防止你或其他人误用不受支持的硬件,可能会帮助保证或支持合同的合规性。This can be used to prevent mistaken use of unsupported hardware by you or others, which may help with warranty or support contract compliance. 此功能当前仅限于物理磁盘设备,包括 Ssd、Hdd 和 NVMe 驱动器。This functionality is currently limited to physical disk devices, including SSDs, HDDs, and NVMe drives. 支持的组件文档可以限制模型、制造商(可选)和固件版本(可选)。The Supported Components Document can restrict on model, manufacturer (optional), and firmware version (optional).

用法Usage

支持的组件文档使用了 XML 灵感的语法。The Supported Components Document uses an XML-inspired syntax. 建议使用最喜欢的文本编辑器,如免费Visual Studio Code或记事本,创建一个可以保存并重复使用的 XML 文档。We recommend using your favorite text editor, such as the free Visual Studio Code or Notepad, to create an XML document which you can save and reuse.

部分Sections

该文档有两个独立的部分: DisksCacheThe document has two independent sections: Disks and Cache.

如果提供了 @no__t 0 部分,则只允许列出列出的驱动器(如 Disk)来加入池。If the Disks section is provided, only the drives listed (as Disk) are allowed to join pools. 将阻止所有未列出的驱动器加入池,从而有效地阻止它们在生产中的使用。Any unlisted drives are prevented from joining pools, which effectively precludes their use in production. 如果此部分为空,则允许任何驱动器加入池。If this section is left empty, any drive will be allowed to join pools.

如果提供了 @no__t 0 部分,则仅将列出的驱动器(如 CacheDisk)用于缓存。If the Cache section is provided, only the drives listed (as CacheDisk) are used for caching. 如果此部分为空,则存储空间直通会根据媒体类型和总线类型尝试进行猜测。If this section is left empty, Storage Spaces Direct attempts to guess based on media type and bus type. 此处列出的驱动器还应列在 DisksDrives listed here should also be listed in Disks.

重要

支持的组件文档不会将以追溯方式应用到已在使用中的驱动器。The Supported Components Document does not apply retroactively to drives already pooled and in use.

示例Example

<Components>

  <Disks>
    <Disk>
      <Manufacturer>Contoso</Manufacturer>
      <Model>XYZ9000</Model>
      <AllowedFirmware>
        <Version>2.0</Version>
        <Version>2.1</Version>
        <Version>2.2</Version>
      </AllowedFirmware>
      <TargetFirmware>
        <Version>2.1</Version>
        <BinaryPath>C:\ClusterStorage\path\to\image.bin</BinaryPath>
      </TargetFirmware>
    </Disk>
    <Disk>
      <Manufacturer>Fabrikam</Manufacturer>
      <Model>QRSTUV</Model>
    </Disk>
  </Disks>

  <Cache>
    <CacheDisk>
      <Manufacturer>Fabrikam</Manufacturer>
      <Model>QRSTUV</Model>
    </CacheDisk>
  </Cache>

</Components>

若要列出多个驱动器,只需添加额外的 @no__t 0 或 @no__t 的标记即可。To list multiple drives, simply add additional <Disk> or <CacheDisk> tags.

若要在部署存储空间直通时插入此 XML,请使用 @no__t 参数:To inject this XML when deploying Storage Spaces Direct, use the -XML parameter:

$MyXML = Get-Content <Filepath> | Out-String  
Enable-ClusterS2D -XML $MyXML

部署存储空间直通后,若要设置或修改支持的组件文档:To set or modify the Supported Components Document once Storage Spaces Direct has been deployed:

$MyXML = Get-Content <Filepath> | Out-String  
Get-StorageSubSystem Cluster* | Set-StorageHealthSetting -Name "System.Storage.SupportedComponents.Document" -Value $MyXML  

备注

型号、制造商和固件版本属性应完全匹配使用 Get-physicaldisk cmdlet 获取的值。The model, manufacturer, and the firmware version properties should exactly match the values that you get using the Get-PhysicalDisk cmdlet. 这可能不同于“常识”期望,具体取决于供应商的实施。This may differ from your "common sense" expectation, depending on your vendor's implementation. 例如,制造商不是“Contoso”,而可能是“CONTOSO-LTD”,或者在型号为“Contoso-XZY9000”时它可能保留为空。For example, rather than "Contoso", the manufacturer may be "CONTOSO-LTD", or it may be blank while the model is "Contoso-XZY9000".

你可以使用以下 PowerShell cmdlet 进行验证:You can verify using the following PowerShell cmdlet:

Get-PhysicalDisk | Select Model, Manufacturer, FirmwareVersion  

设置Settings

请参阅运行状况服务设置See Health Service settings.

请参阅See also