Hyper-v (2022) iSCSI MPIO CSV settings for resiliency

Andy Summers 0 Reputation points
2024-04-24T09:37:18.0633333+00:00

Working/Production 2 node Server 2022 Hyper-v cluster with Qnap iscsi for CSV storage.

I only have limited number of 10Gb ports so configured a 2nd iSCSI connection using a 1Gb link to a different switch.

I've configured MPIO with weighted paths because I didn't want it failing over to the 1Gb and not moving back to the 10Gb. Given the 1Gb path a weight of 10,000.

I'm not sure the resiliency is working though, as the cluster sees brief issues of the CSV as it not being available.

Q1. in iSCSI I only have 1 Portal Group - this has 2 network portals, both with indexes of 0 - is this correct as some screenshots I see have "2 Portal Groups"? (Qnap and servers have 2 Links/IP addresses on different subnets via different network switches)

Q2. should I tick "Path Verify Enable" on the disk devices? (Currently not enabled.) Implications?

Q3. should I change any of the settings - eg increase Disk Timeout or PDORemovePeriod etc - I am not sure if increasing these will mean it will wait longer before using the 1Gb link?

Ultimately I want the iscsi to be as resilient as possible because I had a brief 10Gb network blip and Hyper-v corrupted the Disks of the VMs - so I guess increasing the time iscsi traffic is queued/retried is the goal, but I don't understand the impact of the iSCSI/MPIO settings.

Windows
Windows
A family of Microsoft operating systems that run across personal computers, tablets, laptops, phones, internet of things devices, self-contained mixed reality headsets, large collaboration screens, and other devices.
4,770 questions
Windows Server
Windows Server
A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.
12,173 questions
Hyper-V
Hyper-V
A Windows technology providing a hypervisor-based virtualization solution enabling customers to consolidate workloads onto a single server.
2,547 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Alex Bykovskyi 1,831 Reputation points
    2024-04-24T19:36:29.5533333+00:00

    Hey,

    In case using mixed networks with iSCSI, we usually recommend going with Failover Only multipathing policy. In this case, 1 GB interface will only be used in case of 10Gb link failure.

    As for specific mulitpathing settings, you should check it with Storage vendor. Different vendors can have different recommendations. I haven't found anything recent from Qnap, so it might be better to contact their support.

    https://files.qnap.com/news/pressresource/product/How_to_connect_to_your_QNAP_Turbo_NAS_from_Windows_Server_2012_using_MPIO.pdf

    You can add storage redundancy to your configuration by using StarWind VSAN as a shared storage. VSAN will create replicated shared storage pool and share it via iSCSI with Hyper-V hosts. In this scenario, your setup will be able to tolerate storage failure. Might be helpful: https://www.starwindsoftware.com/best-practices/starwind-virtual-san-best-practices/

    Cheers,

    Alex Bykovskyi

    StarWind Software

    Note: Posts are provided “AS IS” without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

    0 comments No comments

  2. Ian Xue (Shanghai Wicresoft Co., Ltd.) 29,971 Reputation points Microsoft Vendor
    2024-04-25T04:19:25.0133333+00:00

    Hi Andy,

    Good day!

    Q1. Having multiple portal groups isn't strictly necessary unless you have specific requirements for separating traffic or managing redundancy at the iSCSI initiator level. Since you're using multiple network portals on the same portal group with different IP addresses and subnets, it should be sufficient for redundancy.

    Q2. Enabling path verification can help detect and remove failed paths more quickly, improving overall resilience. It's generally a good idea to enable this feature, but it's essential to test it in your environment to ensure it behaves as expected without causing unnecessary disruptions.

    Q3. Increasing these values can indeed help prevent premature failovers to the 1Gb link during temporary network blips. However, you'll need to strike a balance between resilience and responsiveness. Longer timeouts can increase the time it takes to detect and react to actual failures, potentially impacting performance or causing delays in failover scenarios. It's recommended to adjust these settings incrementally while monitoring system performance and failover behavior to find the optimal balance.

    Best Regards,

    Ian Xue


    If the Answer is helpful, please click "Accept Answer" and upvote it.

    0 comments No comments