thub.users.profile.tabs.comments.personalized


I believe you that the symptom that you're most concerned about is not DNS. You never came outright and said that you're pinging/testing by IP but that seems like a reasonable assumption.

But before you walk away from DNS, take a moment to spare yourself future drama. If your systems have an internal DNS and an external DNS to choose from, that will eventually bite you. Not all the time, just intermittent aggravating problems. It seems like it should always use DNS 1 when it's available and DNS 2 otherwise, but DNS clients don't understand the concept of "available" the way that you do.

Comments don't allow enough characters for me to address your bigger problem here, I'll start another.

Something about this doesn't sound right. VLAN tags for prioritization? That makes no sense. Especially since it includes a default tag of 0 when the lowest valid value for a VLAN tag is 1. Are you sure that it's not talking about an 802.1p tag? Numbered 0-7?

Ah, I've never had to get into the real networking side of it. I didn't know that 8021.p depended on 802.1q like that, but it is in fact 802.1p. I won't guarantee that it will work because you specifically said that you're using it with a virtual switch and, while Windows Server has support for 802.1p, it was always intended to function with physical hardware.

Enable Datacenter Bridging first:
157956-image.png

Then work through the QOS cmdlets until everything is ready, particularly Enable-NetQosFlowControl. All the related cmdlets are here: https://docs.microsoft.com/en-us/powershell/module/dcbqos/?view=windowsserver2022-ps.

To reiterate, I've only ever worked on this with physical adapters, so I don't know what will happen for virtual adapters.

image.png (29.8 KiB)

I always say to triple-check the IPs and netmasks on every node. I also kill IPv6 because, despite all sorts of public yelling to the contrary, it never seems to do anything positive for clustering.
I don't believe that it's just a UI thing because the UI sits directly atop Failover Clustering's WMI interface with nothing inbetween. I guess there's a possible bug in the lower layer, but that seems improbable. If it's getting to the WMI tools as an unreachable status, then the cluster is treating it as unreachable.

My next plan was to use Wireshark scoped to the connection between a couple of the troubled adapters to see if that yields any clues. I never got that far myself.

When the network shows as partitioned, it's already excluded from cluster and client traffic.
A failover cluster of Hyper-V hosts doesn't expose any client connection points on its cluster networks so the only thing you can possibly lose is inter-node communication. In the state shown in your screenshot, you only have one network that can carry inter-node traffic so it will have no backup path for carrying heartbeat information, node state updates, or CSV traffic if any.

It will ask if you want to drain roles when you pause a node. Once the pause has completed, it will no longer accept inbound migrations from other nodes in the cluster. It will not interfere with any VM deployments.

I have not seen this exact behavior, although I've never had good luck long term with Debian-based distros in Hyper-V.
Since you seem to specifically have problems with the virtual NIC (you keep saying virtual 'switch' but I'm guessing you mean 'adapter', right?), Linux distros usually get upset when the vNIC's MAC changes, which can happen any time that the hypervisor creates the VM's partition. The worst symptom I've ever seen is that the VM won't connect to the network, not this reboot cycle that you're witnessing. The fix is to tell the VM to use a static MAC address, which shouldn't hurt anything even if it doesn't address your problem.
I would dive into the guest system's logs, though. I can't imagine that a maintenance or management process would reboot the system without recording why.

Early documentation was not clear and led many people to believe that CSV traffic was distinct from other cluster traffic. Not everyone has gotten around to updating their knowledge.

I create one logical network per general-use 10Gb+ physical adapter. If I have 2x 10GbE, then I make 1 management + 1 Live Migration network, both enabled for cluster traffic with the Live Migration network set as preferred for Live Migration. If I have 4x 10GbE, then I make 1x management network and 3x cluster networks. If I have > 4 general-purpose 10GbE adapters then I start asking why I have > 4 10GbE general-purpose adapters because that money could have bought more memory. I try to keep storage traffic on dedicated physical adapters, but when I can't, I just add them as logical networks with no change to my other procedure.

Show your workflow for the migration attempt. That looks like it's expecting the VHDX to live in a shared location, meaning that it's trying to move just the VM and not its storage.

You need a distinct, independent copy of the data. If the "backup hard drive" overwrites its copy with changes from the data, then its copy is neither distinct nor independent. In that case, the "backup hard drive" is only considered a backup of the physical drive, not a backup of the data contained on the physical drive.

Well, again, the ability to read a file and the ability to transform a file do not necessarily go hand-in-hand. Without inside knowledge, the most likely probability is that 9.1 and 9.2 versions exist somewhere, but don't have sufficiently compelling changes for Microsoft to produce a public transform routine that they would then need to support. It's also possible that these versions embed information that only makes sense in Azure, which would mean there's no point in adding an on-premises transform routine. Don't lose sight of the fact that the only thing that we know for certain about these versions is that they have higher version numbers. Version 9.0 implements all known Hyper-V functionality and has no published defects.

You should have one or two brokers across your deployment. Do not install the broker role on a virtualization node.

2012 R2 didn't require SLAT. 2019 does. This is a hardware feature and the check is hardcoded. Any hack would need to fool the operating system into thinking that the hardware can do something that it cannot. If there is a way to disable the check, it's not published outside of Microsoft.

I don't see any reason to rebuild the data disks. Make a different VM and see if it will accept them.

If it's running off of the RCT, then I suppose I'm not sure why you care if the VM migrates. That's all volume-based work. Everything that your API calls read will come from an unmoving, inert VHDX. The hypervisor has to answer for the API calls. It's the host's responsibility to be sure that you can get what you're asking for, not yours.
I don't know how to call these particular APIs, but I do know that if I start a backup with software that is aware of the CBT API and then I try to migrate a VM, I get an 0x8007138d error. I assume that you could try the same thing and see what happens.