question

PompyGeorge avatar image
0 Votes"
PompyGeorge asked jiayaozhu-MSFT commented

Hyper-V Replica Fails

W2016 error

Hyper-V could not replicate changes for virtual machine ABC The system cannot find the file specified (0x80070003). Virtual machine ID (Guid)

The replicas work fine for weeks and then fail.
The only fix is to delete and recreate the replicas..

windows-server-hyper-v
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

jiayaozhu-MSFT avatar image
0 Votes"
jiayaozhu-MSFT answered

Hi,

Thanks for posting on our forum!

I firstly want to know if you have already resolved your issue by deleting and recreating the replicas.

Secondly, based on your descriptions, I suppose that your issue is the default path to the changes to VM (avhd. files) no longer available. You are suggested to check if your avhdx files has a proper link with their parent disk. Just go to Hyper-V Manager and select inspect disk. Please also check the vhd file location of the VM, check if the file exists in the correct location in file explorer:
98937-case-hyper-v.jpg


Besides, this error message "the system cannot find the file specified (0x80070003)" can also occur when your avhd files are corrupted. However, if you solved your issue by recreating your replicas, this cause can be excluded.

Thanks for your support!

BR,
Joan


If the Answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


case-hyper-v.jpg (153.1 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

PompyGeorge avatar image
0 Votes"
PompyGeorge answered PompyGeorge commented

Hi jiayaozhu,

It's none of the above. Replication works fine until there is disruption, and never recovers.
Once the replication is critical all you can do is delete the replica and start again which is a pain

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,

Thanks for your sharing! I would like to check if this issue appears to you again. If not, I think you can just ignore it and I will keep track on this kind of issue. If I find any discoveries or public documents discussing this kind of issue, I will post on this blog as soon as possible. Besides, I think your solution can be helpful for people who have a similar issue with yours. In this case, you can self accept your answer so that those people can get access to the solution more quickly.

Thanks for your support and understanding! Have a nice day! : )

BR,
Joan

0 Votes 0 ·

I am afraid it can't be ignored.
If the replica goes critical, there is, as far as I know, no way to continue replicating without deleting the replica and starting again.

If you don't do that you cannot fail over which could be terrible news.

I would love to work out why the replicas fail.

0 Votes 0 ·
jiayaozhu-MSFT avatar image
0 Votes"
jiayaozhu-MSFT answered PompyGeorge edited

Hi,

Thanks for your reply!

I see, so you mean everytime you try to run a replication and it indicates critical which lead you to delete the task and start again right? When you restart and run replication, your status remains Critical or turns to Health? In this case, I think we need to conduct a more comprehensive troubleshoot. Firstly, here is an article for explanation why your Health report showing "Critical", take a look to better understand your condition first:

https://www.altaro.com/hyper-v/how-to-check-hyper-v-replica-health-part-2/

(Please note: Information posted in the given link is hosted by a third party. Microsoft does not guarantee the accuracy and effectiveness of information.)

From this article, you can see that there are three reasons why you got a "Critical":

1) Primary Server is not able to send the replication packets to the replica server due to network connectivity issues or issues with components of Primary or Replica Servers.
2) Primary Server is not able to keep track of the changes.
3) An administrator has paused the replication at the Replica Server.

Based on your descriptions, I guess your situation is more related to 2), but we still need to check if your issue falls into 1) or 3).

It is simple to check 3), just check if your disruption was caused by a pending operated by an administrator.

To check 1), I would like you to do some basic tests: have you tried manually fail over to other nodes in your cluster? Can you ping successfully between VMs that you did replication?

To solve 2), in most cases, such virtual machine requires resynchronization or you must fix the errors before Primary Server loses control over the tracked changes. Here is an article for guiding you to operate
resynchronization:

https://www.serverwatch.com/guides/hyper-v-replica-resynchronization-process/

(Please note: Information posted in the given link is hosted by a third party. Microsoft does not guarantee the accuracy and effectiveness of information.)

In addition, since it is quite a while for you to resynchronization, there is a risk to lose some backup, but you can run replication after rebooting the task, so I think it might not be a big issue.

Finally, if the cause of your issue turns out to be beyond these three reasons or you need faster and more in-depth investigation, let's say, you may need to analyze certain logs from %SystemRoot%\System32\winevt, then I may suggest you to open a case with our engineers. You may find phone number for your region accordingly from the link below.

Global Customer Service phone numbers:
https://support.microsoft.com/en-us/help/13948/global-customer-service-phone-numbers

Anyway, let's do the basic troubleshooting I have gave you, to see if we could narrow down your issue.

Thanks for your patience and understanding!

BR,
Joan


If the Answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Joan,

Good reply, thank you.

I have been through the articles and double-checked the settings, all seems OK, no obvious errors.
All comms OK.

What happens is the VM's will happily replicate, until they switch from a Warning to Critical status.
Sometimes you can tell the VM to continue replication, sometimes not.
Once it is at critical, all you can do is delete the replica and start the whole process again.

I would like to find out why the replication is failing in the 1st place.

0 Votes 0 ·
jiayaozhu-MSFT avatar image
0 Votes"
jiayaozhu-MSFT answered

Hi,

Thanks for your reply!

Now, that us firstly to summarize the situation at present. Your replication worked fine until there was an unrecovered disruption, meanwhile, the report turned from "Warning" to "Critical", as a result you cannot replicate unless killing the present task and restart a replication, right?

You said you didn't see any obvious error except for "The system cannot find the file specified (0x80070003)"? So, here is the next steps:

1) Go to Event Viewer>> Applications and Services Logs>> Microsoft>> Windows>> Hyper-V VMMS to see if any event logs 1) occurred when your replication was disrupted and could not recover (the error code or message popping up when you failed to recover your replication after it was disrupted); 2) logs related to error codes or messages when the report turned to "Critical"; 3) logs related to your failure to replication except for "The system cannot find the file specified (0x80070003)".

2) Take the screenshot for your replication report for both primary and replica VMs, especially for the line for "Critical", I still need to clarify the errors for showing "Critical".

3) Run validation test for replication, to see what can be detected.

Besides, I still think you have lost some hrl. files for your replication when it was disrupted, which caused nonsynchronization between primary VM and replica VM. Besides, you said you cannot recover the lost files, can you tell me the exact error codes or messages for this failure to recover files?

Thanks for your support!

BR,
Joan


If the Answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

PompyGeorge avatar image
0 Votes"
PompyGeorge answered jiayaozhu-MSFT commented

Morning,

Nearly correct.
After some time replication may stop for no apparent reason. If there is a network outage (Hyper-V Replica uses the regular LAN - Network to communicate between Hyper-V Brokers) or one cluster is off-line for, I believe longer than the replication window, then the status goes from Warning to Critical.
Once the status is at critcal I have found no way to resume the replication except to delete the replica server and restart replication again.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,

I am glad to hear that you have made some progress. And yes, your issue seems like tp be related to network issue, as what mentioned in the three possible causes for turning to "Critical", discussed in the article that I sent you before.

Waiting for your good news soon! : )

BR,
Joan

0 Votes 0 ·
PompyGeorge avatar image
0 Votes"
PompyGeorge answered

1) Go to Event Viewer>> Applications and Services Logs>> Microsoft>> Windows>> Hyper-V VMMS to see if any event logs 1) occurred when your replication was disrupted and could not recover (the error code or message popping up when you failed to recover your replication after it was disrupted); 2) logs related to error codes or messages when the report turned to "Critical"; 3) logs related to your failure to replication except for "The system cannot find the file specified (0x80070003)".

I will have a look and report back

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.