Windows Server 2016 Storage Spaces Direct Cluster - Regular Disk Corruptions on Linux Servers

Zafer Akin 1 Reputation point
2020-10-05T13:37:37.187+00:00

We have a 4-Node Windows Server 2016 Datacenter (Core) Hyper-V Cluster with Storage Spaces Direct with following details:

  • 3 Tier (NVMe as Cache, SSD for Performance Tier, HDD for Capacity Tier)
  • 8x 4TB CSVs with 10% Reserved capacity for performance reasons
  • 2x 25Gb Network Ports (on 2 seperated Mellanox-NICs) on 2x 25GB Switches
  • 1 SMB VLAN per NIC (2 total)
  • 1 Shared Management NIC over these 2 physical NICs
  • All official Windows Updates installed
  • Most actual Firmware and Drivers
  • Built with S2D Certified Hardware and Best Practices Configuration
  • All VMs with actual VM-configuration, Updates and integration services (Windows & Linux)
  • Backup is done of all VMs with SC DPM 2019
  • All VMs are replicated with Hyper-V replication

The cluster is running for over 1 year now and working without major issues so far, good performance.

Our Linux administrator are telling us that they have regular Disk corruptions.
We only had 1 issue with a Windows that had a corruption but we are not sure if it was from the infrastructure or from the DFS. As it occured only ones, we want to focus on the linux servers.
The used Linux versions are up-to-date, supported and built regarding Microsofts best practices for Linux on Hyper-V.
There are Linux servers with heavy load which hadn't a single disk corruption issue. There are others with only very little to no load which had one or more disk corruptions.
Microsoft checked the cluster and did not find any missconfiguration or any alert or something similar. Also the Cluster Report ist perfectly green.
Also the hardware vendor and a 3rd party (Hyper-V MVP) checked the S2D configuration and did not find any configuration or anything else which could lead to that disk corruptions.

The disk corruptions can occure any time without any known process running during that time.
It also occurs when we stop/start VMs for maintenance work, but also only on 2-4 different VMs (out of 150 VMs).

We are looking for any input that could lead us to further testing. Or for someone who also has similar issues on their environment to connect.

Let me know if there are any other information needed.

Hyper-V
Hyper-V
A Windows technology providing a hypervisor-based virtualization solution enabling customers to consolidate workloads onto a single server.
2,550 questions
Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
960 questions
Windows Server Storage
Windows Server Storage
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Storage: The hardware and software system used to retain data for subsequent retrieval.
631 questions
{count} votes