Windows 中的存放裝置類別記憶體 (NVDIMM-N) 健全狀況管理Storage-class Memory (NVDIMM-N) Health Management in Windows

適用於︰Windows Server 2016、Windows 10 (版本 1607)Applies To: Windows Server 2016, Windows 10 (version 1607)

本文向系統管理員和 IT 專業人員提供 Windows 中存放裝置類別記憶體 (NVDIMM-N) 裝置特定的錯誤處理與健全狀況管理相關資訊,強調說明存放裝置類別記憶體與傳統存放裝置之間的差異。This article provides system administrators and IT Pros with information about error handling and health management specific to storage-class memory (NVDIMM-N) devices in Windows, highlighting the differences between storage-class memory and traditional storage devices.

如果您不熟悉 Windows 對存放裝置類別記憶體裝置的支援,這些短片可提供概觀:If you aren't familiar with Windows' support for storage-class memory devices, these short videos provide an overview:

從 Windows Server 2016 和 Windows 10 (版本 1607) 開始,Windows 中使用原生驅動程式支援 JEDEC 相容的 NVDIMM-N 存放裝置類別記憶體裝置。JEDEC-compliant NVDIMM-N storage-class memory devices are supported in Windows with native drivers, starting in Windows Server 2016 and Windows 10 (version 1607). 雖然這些裝置的行為類似於其他磁碟 (HDD 與 SSD),還是有一些差異。While these devices behave similar to other disks (HDDs and SSDs), there are some differences.

這裡所列出的所有狀況都預期是非常罕見的,但還是要根據硬體使用的狀況而定。All conditions listed here are expected to be very rare occurrences, but depend on the conditions in which the hardware is used.

以下各種案例可能會參考儲存空間組態。The various cases below may refer to Storage Spaces configurations. 令人感興趣的特定組態是其中使用兩個 NVDIMM-N 裝置做為儲存空間中的鏡像回寫式快取。The particular configuration of interest is one where two NVDIMM-N devices are utilized as a mirrored write-back cache in a storage space. 若要設定這類組態,請參閱使用 NVDIMM-N 回寫式快取設定儲存空間To set up such a configuration, see Configuring Storage Spaces with a NVDIMM-N write-back cache.

在 Windows Server 2016 中,儲存空間 GUI 會將 NVDIMM N 匯流排類型顯示為「未知」。In Windows Server 2016, the Storage Spaces GUI shows NVDIMM-N bus type as UNKNOWN. 這在建立集區、儲存空間 VD 時不會發生任何功能中斷或失效。It doesn't have any fuctionality loss or inability in creation of Pool, Storage VD. 您可以執行下列命令來驗證匯流排類型:You can verify the bus type by running the following command:

PS C:\>Get-PhysicalDisk | fl

Cmdlet 輸出中的參數 BusType 會正確地將匯流排類型顯示為「SCM」The parameter BusType in output of cmdlet will correctly show bus type as "SCM"

檢查存放裝置類別記憶體的健全狀況Checking the health of storage-class memory

若要查詢存放裝置類別記憶體的健全狀況,請在 Windows PowerShell 工作階段中使用下列命令。To query the health of storage-class memory, use the following commands in a Windows PowerShell session.

PS C:\> Get-PhysicalDisk | where BusType -eq "SCM" | select SerialNumber, HealthStatus, OperationalStatus, OperationalDetails

這麼做會產生此範例輸出︰Doing so yields this example output:

SerialNumberSerialNumber HealthStatusHealthStatus OperationalStatusOperationalStatus OperationalDetailsOperationalDetails
802c-01-1602-117cb5fc802c-01-1602-117cb5fc HealthyHealthy [確定]OK
802c-01-1602-117cb64f802c-01-1602-117cb64f WarningWarning 預測性失敗Predictive Failure {超出閾值,NVDIMM_N 錯誤}{Threshold Exceeded,NVDIMM_N Error}

注意

若要尋找事件中指定之 NVDIMM-N 裝置的實體位置,請在 [事件檢視器] 中事件的 [詳細資料] 索引標籤上,移至 [EventData] > [位置]To find the Physical location of an NVDIMM-N device specified in an event, on the Details tab of the event in Event Viewer, go to EventData > Location. 請注意,Windows Server 2016 會列出不正確的 NVDIMM-N 裝置位置,但這已在 Windows Server 版本 1709 中修正。Note that Windows Server 2016 lists the incorrect location NVDIMM-N devices, but this is fixed in Windows Server, version 1709.

如需了解各種健全狀況的說明,請參閱下列各節。For help understanding the various health conditions, see the following sections.

「警告」健全狀況狀態"Warning" Health Status

這是當您檢查存放裝置類別記憶體裝置的健全狀況,並看到其 [健全狀況狀態] 列為 [警告] 的情況,如下列範例輸出中所示︰This condition is when you check the health of a storage-class memory device and see that it's Health Status is listed as Warning, as shown in this example output:

SerialNumberSerialNumber HealthStatusHealthStatus OperationalStatusOperationalStatus OperationalDetailsOperationalDetails
802c-01-1602-117cb5fc802c-01-1602-117cb5fc HealthyHealthy [確定]OK
802c-01-1602-117cb64f802c-01-1602-117cb64f WarningWarning 預測性失敗Predictive Failure {超出閾值,NVDIMM_N 錯誤}{Threshold Exceeded,NVDIMM_N Error}

下表列出有關此情況的部分資訊。The following table lists some info about this condition.

描述Description
可能的情況Likely condition 違反 NVDIMM-N 警告閾值NVDIMM-N Warning Threshold breached
根本原因Root Cause NVDIMM-N 裝置可追蹤各種臨界值,例如溫度、NVM 存留期,及/或能量來源存留期。NVDIMM-N devices track various thresholds, such as temperature, NVM lifetime, and/or energy source lifetime. 當超過這些閾值的其中一個時,作業系統會收到通知。When one of those thresholds is exceeded, the operating system is notified.
一般行為General behavior 裝置維持完全正常運作。Device remains fully operational. 這是警告,而不是錯誤。This is a warning, not an error.
儲存空間行為Storage Spaces behavior 裝置維持完全正常運作。Device remains fully operational. 這是警告,而不是錯誤。This is a warning, not an error.
其他資訊More info PhysicalDisk 物件的 OperationalStatus 欄位。OperationalStatus field of the PhysicalDisk object. EventLog – Microsoft-Windows-ScmDisk0101/OperationalEventLog – Microsoft-Windows-ScmDisk0101/Operational
工作What to do 根據違反的警告閾值,為謹慎起見,可能需要考慮取代整個或部分的 NVDIMM-N。Depending on the warning threshold breached, it may be prudent to consider replacing the entire, or certain parts of the NVDIMM-N. 例如,如果 NVM 存留期達到閾值時,取代 NVDIMM-N 很合理。For example, if the NVM lifetime threshold is breached, replacing the NVDIMM-N may make sense.

寫入 NVDIMM-N 會失敗Writes to an NVDIMM-N fail

這是當您檢查存放裝置類別記憶體裝置的健全狀況,並看到其 [健全狀況狀態] 列為 [狀況不良],且 [操作狀態] 提及 [IO 錯誤] 的情況,如下列範例輸出中所示︰This condition is when you check the health of a storage-class memory device and see the Health Status listed as Unhealthy, and Operational Status mentions an IO Error, as shown in this example output:

SerialNumberSerialNumber HealthStatusHealthStatus OperationalStatusOperationalStatus OperationalDetailsOperationalDetails
802c-01-1602-117cb5fc802c-01-1602-117cb5fc HealthyHealthy [確定]OK
802c-01-1602-117cb64f802c-01-1602-117cb64f UnhealthyUnhealthy {過時的中繼資料、IO 錯誤、暫時性錯誤}{Stale Metadata, IO Error, Transient Error} {遺失資料持續性、遺失資料、NV...}{Lost Data Persistence, Lost Data, NV...}

下表列出有關此情況的部分資訊。The following table lists some info about this condition.

描述Description
可能的情況Likely condition 遺失持續性 / 備份電源Loss of Persistence / Backup Power
根本原因Root Cause NVDIMM-N 裝置仰賴備份電源以維持其持續性 – 通常是電池或超級電容器。NVDIMM-N devices rely on a back-up power source for their persistence – usually a battery or super-cap. 如果無法使用此備份電源來源或者裝置因為任何原因無法執行備份 (控制器/Flash 錯誤),資料就會有風險,Windows 會防止對受影響的裝置進行任何進一步寫入作業。If this back-up power source is unavailable or the device cannot perform a backup for any reason (Controller/Flash Error), data is at risk and Windows will prevent any further writes to the affected devices. 仍可能會進行讀取以撤除資料。Reads are still possible to evacuate data.
一般行為General behavior NTFS 磁碟區將會卸載。The NTFS volume will be dismounted.
[PhysicalDisk 健全狀況狀態] 欄位會針對所有受影響的 NVDIMM-N 裝置顯示「狀況不良」。The PhysicalDisk Health Status field will show "Unhealthy" for all affected NVDIMM-N devices.
儲存空間行為Storage Spaces behavior 只要僅有一個 NVDIMM-N 受影響,儲存空間將會維持運作。Storage Space will remain operational as long as only one NVDIMM-N is affected. 如果多個裝置受到影響,寫入儲存空間將會失敗。If multiple devices are affected, writes to the Storage Space will fail.
[PhysicalDisk 健全狀況狀態] 欄位會針對所有受影響的 NVDIMM-N 裝置顯示「狀況不良」。The PhysicalDisk Health Status field will show "Unhealthy" for all affected NVDIMM-N devices.
其他資訊More info PhysicalDisk 物件的 OperationalStatus 欄位。OperationalStatus field of the PhysicalDisk object.
EventLog – Microsoft-Windows-ScmDisk0101/OperationalEventLog – Microsoft-Windows-ScmDisk0101/Operational
工作What to do 建議您備份受影響的 NVDIMM-N 的資料。We recommended backing-up the affected NVDIMM-N's data. 若要取得讀取權限,您可以手動讓磁碟重新上線 (它會顯示為唯讀 NTFS 磁碟區)。To gain read access, you can manually bring the disk online (it will surface as a read-only NTFS volume).

若要完全清除這種情況,則必須解決根本原因 (也就是,根據問題來維修電源供應器或是更換 NVDIMM-N),且 NVDIMM-N 上的磁碟區必須離線並重新上線,或者系統必須重新啟動。To fully clear this condition, the root cause must be resolved (i.e. service power supply or replace NVDIMM-N, depending on issue) and the volume on the NVDIMM-N must either be taken offline and brought online again, or the system must be restarted.

若要讓 NVDIMM-N 可再度於儲存空間中使用,請使用 Reset-PhysicalDisk Cmdlet,這會重新整合裝置並啟動修復程序。To make the NVDIMM-N usable in Storage Spaces again, use the Reset-PhysicalDisk cmdlet, which re-integrates the device and starts the repair process.

NVDIMM-N 會顯示容量為 '0' 位元組或是「一般實體磁碟」NVDIMM-N is shown with a capacity of '0' Bytes or as a "Generic Physical Disk"

這是當存放裝置類別記憶體裝置顯示容量為 0 位元組且無法使用,或者公開為「一般實體磁碟」物件且 [操作狀態] 為 [遺失通訊] 的情況,如下列範例輸出中所示︰This condition is when a storage-class memory device is shown with a capacity of 0 bytes and cannot be initialized, or is exposed as a "Generic Physical Disk" object with an Operational Status of Lost Communication, as shown in this example output:

SerialNumberSerialNumber HealthStatusHealthStatus OperationalStatusOperationalStatus OperationalDetailsOperationalDetails
802c-01-1602-117cb5fc802c-01-1602-117cb5fc HealthyHealthy [確定]OK
WarningWarning 遺失通訊Lost Communication

下表列出有關此情況的部分資訊。The following table lists some info about this condition.

描述Description
可能的情況Likely condition BIOS 未向作業系統公開 NVDIMM-NBIOS Did Not Expose NVDIMM-N to OS
根本原因Root Cause NVDIMM-N 裝置是以 DRAM 為基礎。NVDIMM-N devices are DRAM based. 當參考損毀的 DRAM 位址時,大部分的 CPU 會起始電腦檢查,然後重新啟動伺服器。When a corrupt DRAM address is referenced, most CPUs will initiate a machine check and restart the server. 部分伺服器平台會取消對應 NVDIMM,以防止作業系統存取它並防止可能因此導致執行另一次電腦檢查。Some server platforms then un-map the NVDIMM, preventing the OS from accessing it and potentially causing another machine check. 如果 BIOS 偵測到 NVDIMM-N 已經失敗且需要更換時,這也可能發生。This may also occur if the BIOS detects that the NVDIMM-N has failed and needs to be replaced.
一般行為General behavior NVDIMM-N 會顯示為未初始化,容量為 0 位元組且無法讀取或寫入。NVDIMM-N is shown as uninitialized, with a capacity of 0 bytes and cannot be read or written.
儲存空間行為Storage Spaces behavior 儲存空間會維持運作 (前提是只有 1 個 NVDIMM-N 受到影響)。Storage Space remains operational (provided only 1 NVDIMM-N is affected).
NVDIMM-N PhysicalDisk 物件會顯示 [健全狀況狀態] 為 [警告],且為「一般實體磁碟」NVDIMM-N PhysicalDisk object is shown with a Health Status of Warning and as a "General Physical Disk"
其他資訊More info PhysicalDisk 物件的 OperationalStatus 欄位。OperationalStatus field of the PhysicalDisk object.
EventLog – Microsoft-Windows-ScmDisk0101/OperationalEventLog – Microsoft-Windows-ScmDisk0101/Operational
工作What to do NVDIMM-N 裝置必須更換或受到妥善處理,這樣伺服器平台才能將它重新公開給主機作業系統。The NVDIMM-N device must be replaced or sanitized, such that the server platform exposes it to the host OS again. 建議更換裝置,因為可能發生其他無法修正的錯誤。Replacement of the device is recommended, as additional uncorrectable errors could occur. 將更換裝置新增到儲存空間組態的作業,可以使用 Add-Physicaldisk Cmdlet 來完成。Adding a replacement device to a storage spaces configuration can be achieved with the Add-Physicaldisk cmdlet.

在重新開機後,NVDIMM-N 會顯示為 RAW 或空的磁碟NVDIMM-N is shown as a RAW or empty disk after a reboot

這是當您檢查存放裝置類別記憶體裝置的健全狀況,並看到其 [健全狀況狀態] 為 [狀況不良],且 [操作狀態] 為 [無法識別的中繼資料] 的情況,如下列範例輸出中所示︰This condition is when you check the health of a storage-class memory device and see a Health Status of Unhealthy and Operational Status of Unrecognized Metadata, as shown in this example output:

SerialNumberSerialNumber HealthStatusHealthStatus OperationalStatusOperationalStatus OperationalDetailsOperationalDetails
802c-01-1602-117cb5fc802c-01-1602-117cb5fc HealthyHealthy [確定]OK {不明}{Unknown}
802c-01-1602-117cb64f802c-01-1602-117cb64f UnhealthyUnhealthy {無法識別的中繼資料、過時的中繼資料}{Unrecognized Metadata, Stale Metadata} {不明}{Unknown}

下表列出有關此情況的部分資訊。The following table lists some info about this condition.

描述Description
可能的情況Likely condition 備份/還原失敗Backup/Restore Failure
根本原因Root Cause 備份或還原程序失敗可能會造成 NVDIMM-N 上所有的資料遺失。A failure in the backup or restore procedure will likely result in all data on the NVDIMM-N to be lost. 作業系統載入時,會顯示為沒有磁碟分割或檔案系統的全新 NVDIMM-N,並呈現為 RAW,代表它沒有檔案系統。When the operating system loads, it will appear as a brand new NVDIMM-N without a partition or file system and surface as RAW, meaning it doesn't have a file system.
一般行為General behavior NVDIMM-N 會處於唯讀模式。NVDIMM-N will be in read-only mode. 需要明確的使用者動作,才能再次使用它。Explicit user action is needed to begin using it again.
儲存空間行為Storage Spaces behavior 如果只有一個 NVDIMM 受到影響,儲存空間會維持運作)。Storage Spaces remains operational if only one NVDIMM is affected).
NVDIMM-N 實體磁碟物件會顯示 [健全狀況狀態] 為 [狀況不良] 且儲存空間不會使用。NVDIMM-N physical disk object will be shown with the Health Status "Unhealthy" and is not used by Storage Spaces.
其他資訊More info PhysicalDisk 物件的 OperationalStatus 欄位。OperationalStatus field of the PhysicalDisk object.
EventLog – Microsoft-Windows-ScmDisk0101/OperationalEventLog – Microsoft-Windows-ScmDisk0101/Operational
工作What to do 如果使用者不想要更換受影響的裝置,他們可以使用 Reset-PhysicalDisk Cmdlet 來清除受影響 NVDIMM-N 的唯讀狀況。If the user doesn't want to replace the affected device, they can use the Reset-PhysicalDisk cmdlet to clear the read-only condition on the affected NVDIMM-N. 在儲存空間環境中,這也會嘗試將 NVDIMM-N 重新整合至儲存空間,並啟動修復程序。In Storage Spaces environments this will also attempt to re-integrate the NVDIMM-N into Storage Space and start the repair process.

交錯式集合Interleaved Sets

交錯式集合通常可以在平台的 BIOS 中建立,使多個 NVDIMM-N 裝置向主機作業系統顯示為單一裝置。Interleaved sets can often be created in a platform's BIOS to make multiple NVDIMM-N devices appear as a single device to the host operating system.

Windows Server 2016 和 Windows 10 Anniversary Edition 不支援 NVDIMM-N 的交錯式集合。Windows Server 2016 and Windows 10 Anniversary Edition do not support interleaved sets of NVDIMM-Ns.

在撰寫本文時,還沒有任何機制可讓主機作業系統正確地識別類似集合中的個別 NVDIMM-N,並清楚地告知使用者哪一個特定裝置造成錯誤或需要維修。At the time of this writing, there is no mechanism for the host operating system to correctly identify individual NVDIMM-Ns in such a set and clearly communicate to the user which particular device may have caused an error or needs to be serviced.