I am in the process of testing and documenting a customers DR procedure in case their primary Hyper-V host would crash and i stumbled upon an issue that i can't seem to get my head around. Initially the replication function was in a really bad shape, not replicating a single MB in 6 months, but i managed to get it to function again and new setup or a cluster setup isn't in their plans or budget right now.
I tried to perform an Unplanned and a Planned Failover with a VM that is not in production and it failed over to the replica host without any issues.
The problem occurs when i try to perform a Reverse Replication, it simply fails because it seems to believe that the receiving host is not in a state to accept replication, these are the things that i have verified so far:
I have made sure that the Hyper-V Replica HTTPS Listener (TCP-In) - Firewall rule is enabled on both ends since the replication is done through HTTPS 443 with a certificate trust. I have made sure that no other external firewall is blocking the traffic.
Both the Hyper-V hosts are configured to accept replication, there is no difference in the configuration as far as i can see.
I tried to break the replication of the VM, delete the replica completely on the replication host and recreate the replication, but still no difference.
I tried creating av new temporary VM, enabling replication for it, works fine but reverse replication still fails.
I tried to instead create a new temporary VM on the replication host instead and replication to the primary host worked fine, but again when i try to reverse it even from the opposite side it still fails.
The hosts can ping and resolve eachothers names as far as i can see so it should be a question of DNS or hosts.
I will attach some screenshots below from the configuration just in case, and also i would like to add that the customer has set this up so that the hosts are not domain joined, instead they are in a WORKGROUP but the full computer name contains the FQDN. When i asked why this was done, they answered that it is because of security reasons and that they have had problems with really slow rebooting of the hosts when they were domain joined. I dont know if the architecture could be the reason that this fails but i will try to recreate the enviroment in my own homelab since it works fine otherwise.




