question

CannonChris-5226 avatar image
5 Votes"
CannonChris-5226 asked BrendonHolt-7726 commented

Problems with Hyper-V on Server 2019 (1809) after August 2020 patches

We've been having problems after installing August 2020 Patches on our 2019 HyperV hosts. We have multiple hyper-v clusters across Dell VRTX and UCS blades w/ ISCSI backend SANS. Both environments have seen backup times double. Additionally loading a VMs settings in HyperV or Failover cluster manager is taking a very long time. We are not using a 3rd party AV but defender managed by SCCM. Usually when i see issues like this it feels like a storage performance issue but i'm seeing the issue across the board with iscsi as well as direct attached.

Patches applied
KB4566424
KB4565349
KB4569776

our change log indicates no other changes. Our hyperv hosts have no other roles.

I'm going to try roll back the august patching and do a quick A/B test to see if that remediates the issue, but thought i'd post to see if anyone else has seen this issue.

Chris

windows-server-2019windows-server-hyper-v
· 26
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

We are also seeing delays on our 2019 Cluster. This manifests in slow "Get-VM" commands on the nodes and slowness opening settings for any of the VMs. It also seems to be causing delayed stats coming into Veeam One - sometimes these can be 15 minutes behind.

I don't like the idea of uninstalling updates to fix the issue - essentially if Microsoft is aware of the issue, they need to fix it! It could also affect anyone that has ISO compliance requiring fixes to be applied within a particular time window.

Does anyone have a link or reference to the Hotfix (Hotfix ID?) which solves the issue? I don't mind opening a case with Microsoft, if I need to go down that route to receive the fix, but it will be quicker if I can point them in the right direction.

2 Votes 2 ·

I don't think there is a public hotfix yet, only a private hotfix created ad hoc. You'll have to wait for a general release.
Alex

1 Vote 1 ·

I'm not sure i feel comfortable sharing the hotfix MS provided me as i'm not sure it was anywhere close to a finished product. I should add that I am not comfortable putting this on any of our production 2019 Hyper V servers.; I'm waiting on the general release.

1 Vote 1 ·
Show more comments

Add us to the mix, this weekend (Thanksgiving) we decided to update the Hardware Platform. Installation of Server 2019 went fine and updates were applied, up until November.

We experienced VERY SLOW Hyper-V Manager to Create or to Manage Settings on the Two New Super Powerful Computers just installed. Testing showed that during Hyper-V Manager could slow Guests down so much login to RDS servers was slow/failed. Was just taking forever to work in settings. Sunday we re-installed 2019 Server on one Machine and it was fine until updates were applied.

We decided to apply updates and deal with this issue because we provide this platform to end users, and the Legal/Financial Liability from Zero Net Logon outweighs the speed of IT Management. NOTE WE CANNOT WORK IN HYPER-V MANAGER IN PRODUCTION HOURS, 11PM to 2AM.

This BUG NEEDS TO BE FIXED and it is a SEVERE LIMITATION. I understand we all work in a complex and dangerous world, but hopefully we get a Fix SOON for this issue.

2 Votes 2 ·

Hi,

Yes, you can roll back to the previous version to see if the question still exists. And you can update your result here.

And if anyone has the same issue, welcome to post your question here.

Best Regards,
Daniel

1 Vote 1 ·

I removed only KB4565349 from a 5 node cluster (150 vms) on UCS blade chasssis and from a smaller 2 node cluster (dell vrtx) with around a dozen VMs. With the larger cluster, i was previously waiting on the window to open for a VM settings from FCM was taking between 20-40 seconds. I'm loading them now between 3 - 5. I do not yet have statistics on backups; I had to pause backups on the larger cluster b/c the job was running past the maintenance window. I'm re-enabling that back up now. With the smaller cluster, i made the change after backups had completed so i don't have a comparison.

With it being a longer holiday weekend i'll likely let things sit as they are over the weekend. If backups seem back to normal, i'm open to do some more testing with a couple other clusters (brand new UCS blades). To my knowledge, we're not seeing any VM performance issues at this point.

Thanks Chris







1 Vote 1 ·

Sounds good.

Please let me know if you have any other questions.


1 Vote 1 ·

This is certainly happening on clusters we run. We have not yet rolled back (more complicated than removing a single update) and inherently puts our environment in risk by running on an old rollup if we do.

Very frustrating to not have Microsoft already patched this, so we'll likely burn a support case to get a hotfix.

Can you provide any estimate on resolution or acknowledgment that Microsoft has even identified this?

1 Vote 1 ·

I just heard back. The update is scheduled for inclusion in January.

3 Votes 3 ·
Show more comments

Its been a month since i've heard from them. I pinged them a moment ago to see if there is a schedule for it to be included in the normal rollup. Believe me... I share your frustration. It took several calls/screen shares with multiple teams to get anyone to really acknowledge the issue. I'm still waiting on it to be rolled into the monthly cumulative. I'll respond if i hear back.

Chris

1 Vote 1 ·
Show more comments

On a 6 Nodes Win2019 SCVMM managed FailOverCluster, that we use for Automated Testing , which heavily relies on CheckPoints we have these findings:

  • Win2019/SCVMM2019 has double duration, when operations are performed on Non SCV owner, whereas Win2016/SCVMM2016 has no measureable difference.

  • Win2019+KB4586793 doubles Hyper-V Restore on Non owner. SCVMM Restore is four times slowere, mostly affected by prolonged Refresh durations

  • Win2019+KB4598230 adds 30% to Hyper-V Restore. Doubles SCVMM Restore duration,

  • Eventually MigrateVm AWAY from SCV Owner ends up lasting 6 – 14 minutes

Measurements when Restoring a complex CheckPoint (Numbers are seconds):

56592-image.png

56439-image.png

56440-image.png

So January Update was even worse!

0 Votes 0 ·
image.png (5.4 KiB)
image.png (5.6 KiB)
image.png (5.7 KiB)

BTW: Due to lots of SCVMM Error ID 2606 From the superslow Refresh(*), we have added 'Retry' SCVMM Command' for ERRORID:2606 to our Test Automation Execution , but are still severely down on Test Thrioughput.. We are still on KB4586793!


Unable to acquire a 'Delete' lock on object '52112b4e-3c4c-42b0-a2d9-0e708231d1c5' of type 'VirtualHardDisk' because it is locked by task 'ec55c407-f8dc-4900-9f3a-ce570d4a5d02' 'Refresh host cluster' with a 'Write' lock. (Error ID: 2606)

0 Votes 0 ·

For completeness, we changed VmDatStore.dll back to the latest available in WinSxS before August 2020, an got these numbers - so no relief compared to August, but sime relief compared to the Jan 2021 results (We have only tried this on a Staging environment!)

.56671-image.png


0 Votes 0 ·
image.png (19.7 KiB)

Hello Jens.
Have you been in talk with Microsoft regarding this issue? :)

0 Votes 0 ·
Show more comments

We opened a ticket with MS and referenced this post and the case# CannonChris-5226 provided. We were provided a private hotfix that is time limited which they stated they have had for a few months. It was time stamped in October 2020. Installing this hotfix requires using bcdedit to put and leave the server in test mode until the patch is removed and replaced with the public version. They also make accept a bunch of warnings that the hotfix is provided as-in and may cause other issues.

Given the impact on our environment and that there were no issues encountered in our test environment we have put this into production and it immediately resolved the IO performance issue. It did not solve the issue with VM settings taking a long time to load or the right click menu in cluster manager disappearing when you try to select something.

0 Votes 0 ·
Show more comments
CannonChris-5226 avatar image
1 Vote"
CannonChris-5226 answered DanielZhou-MSFT commented

Quick update. After the removal of KB4565349 backup times returned to normal (With the patch backups went from 4-5 hours to 11-12 hours on the cluster i'm testing against) . I can also launch Vm settings within seconds now that the patch is gone. Are there known issues with this KB?

https://borncity.com/win/2020/08/14/probleme-microsoft-august-2020-patchday-nachlese/ There is a comment that leads me to believe there are others that are having the issue as well. I don't see any new comments however.

I'm seeing the issue across different hardware platforms on both datacenter and standard editions.



Chris

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,

I haven't seen your issue in the known issue for this update.

Best Regards,
Daniel

1 Vote 1 ·
chriZa-3037 avatar image
1 Vote"
chriZa-3037 answered

Same problem here - since KB4565349 backup and replication tasks are very slow.
Just installed KB4570333 on all hyper-v hosts but still backup times are doubled.
Will this be fixed in an upcoming CU or do I have to uninstall the CUs KB4570333 and KB4565349 to get back to a normal backup behavior.

Regards,
chriZa

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

CannonChris-5226 avatar image
1 Vote"
CannonChris-5226 answered CannonChris-5226 commented

I too have installed Septembers KB4570333 and have found the problems have returned; assuming the problem was rolled forward. Backups times are again more than double what they were previously. I am finding that when it comes to looking at the VM settings either through FCM or Hyper-v Manager, the slowness loading settings is more prevalent on my deeper clusters. Some of my larger clusters have 120-150 VMs.

Our backups leverage on-host checkpoints so I suspect whatever problem is causing the settings and console connection slowness is also related to the backups. Smaller backups doubling in minutes isn't a huge big deal but when the big backups double, it causes a significant problem.

Daniel, is there a way to file a bug report?

Thanks

Chris

· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Appreciate the follow up. Most those FAQs send you to the generic support page. I was able to open a ticket with MS support using our SA agreement. That was a week or so ago. At this point I've changed teams twice. I provided a pretty thorough list of noted symptoms and some of those cross a few different functional groups so i'm getting passed between teams. I'm back to the original team and i suspect that person is tier 1 or pre-tier 1. If i get some results i'll post again.

I should note to others that the problem is definitely more prevalent with larger environments. While my smaller clusters backups did increase 2x...going 15-30 minutes isn't as noticeable but with our large clusters (100+ Vms) a backup growing by 4-6 hours is terrible.

1 Vote 1 ·

Did Microsoft actually give you a timeline when these issues would be resolved. We are very frustrated also and it is a big PIA.

1 Vote 1 ·
Show more comments

Same problem also with Hyper-V hosts , too long to get vm settings on failover console.
I had to uninstall KB4570333 et KB4565349.
Ok now, but what's about next update ?

Thanks

1 Vote 1 ·
CannonChris-5226 avatar image
2 Votes"
CannonChris-5226 answered AlessioP edited

I have an open ticket with Microsoft and i'm hoping to get some attention to the problem. Unfortunately, i seem to be the only person who has brought this up. Today, I did some testing beyond just removing and installing the patch. I believe the problem is tied to the VmDatastore.dll which was updated in August 2020.

I did some Process Monitor research and found that duration for vmms.exe file reads of any virtual machines vmcx file is nearly double of what it was previously. If you look at the stack for those reads you can see VmDataStore.dll is prominent. While the duration of those reads are tiny fractions of a second, when you consider the loading of a single VM can lead to 40,000 file reads, it addes up. When scaled to a multi-node cluster with a lot of VMs, it becomes more noticeable.

I found i can just roll back and then forward that single .dll and change the performance behavior. Powershell modules , Hyperv manager, Failover cluster manager and i suspect on host backups leverage vmms significantly. I've given MS all this info and hope it will get sent to someone who can do something about it. Maybe it will get some traction.

Chris

· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello Chris,
we are facing the same identical problem, we too have opened a case with Microsoft but they are still running discovery tools.
You say that rolling back that simple DLL you could workaround the problem, do you confirm it?
Do you confirm that uninstalling those patches the problem goes away too?
Is there any chance that you share your case number? This may speed up things for us!
Thanks,
Alex

1 Vote 1 ·

I can confirm
1. Only removing August/Sept 2020 rollup back to July fixed it (kept other patches)
2. Having Sept 2020 rollup installed and rolling back the single dll fixed it

I've done all tests AOK but have not yet initiated a new backup to confirm that is also back to normal.



1 Vote 1 ·

our is 120083125002044
Friday we will try one of the two options and I'll let you know.
Should I wait to share you case number with Microsoft?
Thanks,Alex

1 Vote 1 ·
Show more comments

I have reviewed 2 days of backups after Installing Sept 2020 rollup AND rollback of only VmDataStore.dll. Backup times have gone back to normal. The ticket with MS is still open

1 Vote 1 ·
Show more comments
AlessioP avatar image
1 Vote"
AlessioP answered MikkelKnudsen-3362 commented

We've rolled back all nodes to June 16, 2020—KB4567513 (OS Build 17763.1294) and now backups and node management has come back to normal rate.
We informed Microsoft about this and this is what they've replied :
"The issue is under investigation by product group. We are having problems with our tools right now. I'll let you know when the Hotfix will be available, if we will have this information. I apologize for this inconvenience. We were not aware of it."
Alex

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Heard from Microsoft?

0 Votes 0 ·
Dawid-5103 avatar image
1 Vote"
Dawid-5103 answered

Hi Guys,
Same situation here. Backups was very slow. Failover cluster manager responsibility was very slow.
In our case we have also performance issue with VMs on one of ours Hyper-V server.
Only rollback helped us.

Please let know in this thread when yours tickets was resolved.

Dawid

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

CannonChris-5226 avatar image
2 Votes"
CannonChris-5226 answered BrendonHolt-7726 commented

I plan on updating once my ticket with MS is resolved/closed. Seems like they recognize the issue now and they're working through updating the file(s) in play. I've tested one set thusfar; I'm not sure if Alex has or not. The first set of files i received presented a different problem. There is a significant time zone difference between my tech and I so it can go a half day before i hear back.

Chris

· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

We haven't received anything from Microsoft, they put the ticket in standby, waiting for a solution from the upper level... :(
I'm glad that at least they're working with you!

Alex

1 Vote 1 ·

Hi Chris, Alessio

We have been facing the same issue on our hyperV 2019 cluster.

Did you receive any update from MS on this? We also have a case going on with them but it very recent and not heard anything back so far.

Your response is much appreciated.

Kind Regards,
Jay

1 Vote 1 ·

Hi Jay, we are in your same situation, no answer from Microsoft.
Chris was in a more advanced state...
Alex

1 Vote 1 ·

I wasn't getting very far until i provided specifics to the problem. Just heard back from my tech who indicated the update will be released in January

1 Vote 1 ·
Show more comments
JacobMAIV avatar image
1 Vote"
JacobMAIV answered AlessioP commented

hi Guys,

We have been troubleshooting an Hyper-V server with the exact same issue as you guys described.
an update (I believe the October or August update) caused the Hyper-V MMC to be extremely slow only showing 'connecting....' and stuck.

Running a PowerShell command 'get-vm' took minuets to complete. the vmmworker services could not be restarted.
Only killing the vmmworker.exe could stop it. we had 6 VM's in a saved-state and ware not able to recover from this scenario.

finally we ended up reinstalling the Hyper-V server 2016 and manually imported the VM's again. This worked.

this is the second time we had this issue with the same server. we reinstalled last week and now this update broke it again.

Unfortunately a roll back of the update was not possible because a 'servicesing update' was installed and there was no uninstall option available any more...


any news from you guys? this was the only post I was able to find with the same problem

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Jacob, your post is a bit out of topic, we're talking about Windows Server 2019, not 2016, a TYPO maybe?
You normally find your installed patches going to Control Panel\All Control Panel Items\Programs and Features and then clicking on View installed updates. You can uninstall last cumulative from there.
For your last question, we are still waiting for news from Microsoft, maybe @CannonChris-5226 has news...

Alex

1 Vote 1 ·

hi AlessioP,

Sorry, re-read the entire threat and I was mistaken some things, I thought the match was closer to our case.

Yes I know you are talking about 2019, I'm talking about 2016 (1607).
So yes it's a different version and slow backups and slow cluster manager is different.
We only have a slow Hyper-V console & Powershell. VM's from saved-states ware un-bootable (waited more then an hour to 'resume' a VM)

I believe my issues are differing from the original post, yes.
But I do see similarity on the bigger scope.

if you feel this is out of scope, I will remove my post and start a new thread.

1 Vote 1 ·

I think that the only way to know if could be related to our issue or not, even if on 2016 instead of 2019, is to uninstall all cumulative updates, go backup to June 18, 2020—KB4567517 (OS Build 14393.3755) and see if your problems go away.
Chris found a relation between the update of VmDatastore.dll (after build 17763.1339) and our issue but on Windows Server 2016 there is no such dll...
Alex

1 Vote 1 ·
CannonChris-5226 avatar image
1 Vote"
CannonChris-5226 answered DanielBijl-0204 published

I was given a hotfix last week which looked to be a build up from the July rollup. It did work but my understanding is that subsequent monthly rollups will overwrite it until it is released mainstream. I wasn't given an ETA for that. It did work in the cluster i tested it on. I didn't put it into our production environment just yet. i'll likely sit where i am at and update when its generally released in a monthly update.

Yes my issue is with 2019. I'm not seeing the issue in 2016 but I only have 1 cluster left at 2016 and its fairly small.

I wasn't getting anywhere until I started looking around myself; i have little patience for repeating tier1 actions over and over. I found process monitor and Process explorer useful tools in identifying the component that was operating slowly. One helps see what is in the stack and the other was helpful in seeing timing of certain actions and the files involved. I did the same actions and recorded it in the tools then reviewed them side by side. I ultimately found that vmms.exe's reads to config files were taking twice as long; it was a fraction of a second but when you have thousands of those reads, it adds up. It was also fairly easy to find b/c those reads made up the bulk of the registry monitor results. I found the dll using process explorer and noticed it was prominent in the stack for vmms.

i've seen elsewhere comments similar to what Jacob was stating but i have not seen that specific problem.

Chris

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Any chance you can share the hotfix?

1 Vote 1 ·
TimSmals-4964 avatar image
1 Vote"
TimSmals-4964 answered PaulWebb-5946 commented

Could you maybe share the hotfix you received?

We've got the same performance issues on our 8 node HCI-stack Server 2019 cluster. Lot of I/O troubles in Microsoft-Windows-Hyper-V-StorageVSP/Admin.
Unfortunately I've installed Server 2019 with an ISO from august so we cannot uninstall the latest CU's (SW_DVD9_Win_Server_STD_CORE_2019_1809.6_64Bit_English_DC_STD_MLF_X22-34397) .
Thanks!

· 9
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

As a workaround replace VmDataStore.dll with a version prior CU08 - fixed it for our environment.

2 Votes 2 ·

Were you advised by Microsoft to do this?
Cheers

1 Vote 1 ·

Nope - it's an unsupported workaround, but I can confirm it's working in our 4-node environment.

1 Vote 1 ·
Show more comments