Server 2019 two-node S2D cluster failing on 'Dedup_WeeklyScrubbing' scheduled task

Caleb Opgenorth 96 Reputation points
2020-08-27T17:41:11.013+00:00

Hello,

I have setup a two-node, single-tier, all-flash Storage Spaces Direct failover cluster running on Server 2019 with all the latest updates, however I'm having trouble getting Data DeDeuplication to work properly with it. I have created two nested mirror-accelerated parity CSV's from the resultant storage pool, and formatted them both as ReFS. One CSV has DeDup enabled on it with the usage type set to 'HyperV', the other doesn't.

The problem occurs whenever the 'Dedup_WeeklyScrubbing' scheduled task in Microsoft>Windows>Failover Clustering runs. All the VM's enter either a 'Saved-Critical' or 'Paused-Critical' state on whichever node owns the DeDup'ed CSV whenever the weekly scrubbing task is run. But VM's running on the second node (also stored on the same CSV) stay running and do not encounter any issues.

In the Failover Cluster Manager, the following warning and error messages appear:

Cluster Shared Volume 'NMAP15-DeDupe' ('Cluster Virtual Disk (NMAP15-DeDupe)') has entered a paused state because of '(c0e7000b)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Cluster Shared Volume 'NMAP15-DeDupe' ('Cluster Virtual Disk (NMAP15-DeDupe)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

The CSV's owner node stays the same, the 1460 error is only logged once, and after ~10-15 seconds, the CSV is available again and IO is restored. Some of the VM's on the owner node will recover by themselves, some need to be turned off/back on. When the VM's are configured as highly-available roles in the failover cluster, the VM's will all failover to whichever node is not the owner of the DeDup'ed CSV.

My question is this: is the combination of S2D + Nested Mirror Accelerated Parity + ReFS + Data DeDuplication currently supported? When reading the 'What's New' page for Data Dedup, it says support for ReFS was added in 2019. But then when reading the 'Interoperability' section, it states:

Data Deduplication is fully supported on Storage Spaces Direct NTFS-formatted volumes (mirror or parity). Deduplication is not supported on volumes with multiple tiers. See Data Deduplication on ReFS for more information.

and then the link for 'Data Deduplication on ReFS' leads no where. The volume is single-tier, but is Data DeDup on ReFS not supported on S2D, or is Nested Mirror Accelerated Parity not supported either?

Windows Server 2019
Windows Server 2019
A Microsoft server operating system that supports enterprise-level management updated to data storage.
3,453 questions
Hyper-V
Hyper-V
A Windows technology providing a hypervisor-based virtualization solution enabling customers to consolidate workloads onto a single server.
2,536 questions
Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
957 questions
Windows Server Infrastructure
Windows Server Infrastructure
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Infrastructure: A Microsoft solution area focused on providing organizations with a cloud solution that supports their real-world needs and meets evolving regulatory requirements.
513 questions
Windows Server Storage
Windows Server Storage
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Storage: The hardware and software system used to retain data for subsequent retrieval.
631 questions
{count} votes

Accepted answer
  1. Caleb Opgenorth 96 Reputation points
    2020-09-01T18:47:24.877+00:00

    So after doing some testing, it seems that ReFS was causing the issues with DeDuplication. Once I re-created the CSVs as NTFS (CSVFS_NTFS) volumes and then enabled DeDup on them, running the Scrubbing job from either Powershell or the scheduled tasks caused no issues. This was also using Nested Mirror Accelerated Parity.

    I'm not sure if this occurs with other storage backends (iSCSI, FCoE, etc), but it seems like Data DeDuplication does not work with ReFS on Storage Spaces Direct.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Toussaint OTTAVI 1 Reputation point
    2020-09-14T09:27:34.313+00:00

    Hi,

    I have exactly the same problem, in a quite similar setup :

    • Two-node 2019 S2D cluster
    • All hardware is hpe from certified HCI list
    • Mixed storage (NVME for cache, HDD for capacity)
    • Nested mirror-accelerated parity volume
    • Formatted with CSVFS_ReFS
    • Latest Windows Update and hpe drivers

    If I understand well, "Nested mirror-accelerated parity" is a multi-tier setup (it's a mix of NestedMirror tier + NestedParity tier). Then, it seems we are in an unsupported configuration.

    In my situation, the cluster hosts Hyper-V virtualization, and the CSVFS contains the VDHX virtual disks. When the scrubbing task starts, it puts the CSV in "suspended mode" (c0e7000b), which generates HDD failures and hadware crashes on some VMs. The "suspended mode" is temporary, so some VMs are not affected. But some others crash severely !

    In this setup, NTFS is not an option. I must use ReFS, because it has many advantages for managing VHDX files.

    Then, it seems I have two options :

    • Keep CSVFS_ReFS volume with nested mirror-accelerated parity, but disable deduplication
    • Keep deduplication, but forget mirror-accelerated parity volume, and use single-tier (mirror) volume

    Both options will consume far more storage, anyway : S2D mirroring alone is terribly hungry (it eats 4x storage on a 2-node), and in the other hand, deduplication is very efficient when hosting similar VMs (deduplication gain is approx. 65%).

    I'll have to check which is the best option for me.

    I don't know it it's possible to create a single-tier parity-only volume on a two-node cluster (ie, "parity" for all the HDDs on a server, and "mirroring" between the two nodes).

    Anyway, it would be interesting to know if there are plans to support deduplication on ReFS CSV with nested resiliency (multi-tier), because this is the most efficient combination in terms of storage capacity.

    Kind regards,

    0 comments No comments

  2. Lambe Perovski 1 Reputation point
    2020-12-10T08:33:01.747+00:00

    I had similar problem and I have opened case on MS support. After months of testing and private fixes, they have solved problem with latest KB4586839 on November 19, 2020.

    0 comments No comments