question

JustinPrice-2753 avatar image
0 Votes"
JustinPrice-2753 asked JustinPrice-2753 edited

Server in NLB cluster drops ping packets after reboot - slow app performance

I have two Server 2019 vmWare virtual machines configured as NLB cluster hosts with IGMP Multicast. This has worked well for months. Then updates were turned on for both servers, and for the past few months the servers have been rebooted on a monthly basis.

Now, usually a few days after reboot, I'll get complaints from users that the web app is going slow. Sure enough, I'll start a ping on one of the two servers and one will be dropping tons of ping requests. If I stop the host on the cluster, disable/enable the virtual dedicated NLB NIC, then enable the server again on the cluster, the pings no longer drop and the app returns to normal speed. This will last until reboots happen again the next month.

This is not contained to either of the two servers and is entirely random. It can be one or the other and even both.

Logs from right around the reboot time/date. Start to alternate every 1-15 minutes between the below messages. These stop once I disable/enable the NIC:

  • NLB is initiating convergence on host 0x1 because host 0x2 is leaving the cluster. Event ID: 69


  • Host 0x1 converged with host(s): 1,2. It is now an active member of the NLB cluster and will start load balancing traffic as the default host. The default host is the host with the lowest host priority. It handles all traffic that isn't covered by any of the defined port rules. Event ID: 29

I'm no good with Wireshark, so don't know if packets are being dropped. Any idea what's going on here or how I can troubleshoot?

windows-serverwindows-server-clusteringwindows-server-infrastructure
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

jiayaozhu-MSFT avatar image
0 Votes"
jiayaozhu-MSFT answered jiayaozhu-MSFT commented

Hi,

Thanks for posting on our forum!

1) NLB Event ID 69 and 29 are normal conditions, and are not considered as error warnings. Here is an article about this two event logs:

https://social.technet.microsoft.com/Forums/ie/en-US/5d9c3edd-3556-40ff-b883-5d6d54ea445e/nlb-cluster-converged-event-log-information-event-id-29-and-69?forum=exchangesvrgenerallegacy

2) Based on your description, your issue can be caused by:
1. Host system's temporary shutdown because of the secheduled update.
2. Your Nic has encountered some issues, for instance, your NIC firmware did not activate when you rebooted your system after update.
From my perspective, your issue tend to be caused by 2). To further troubleshoot your issue, you need to firstly test your NIC conditions. If your Nic conditions are fine, then you need to catch packets. Above all, your issue is no longer related to cluster, to some extent. We need you to re-post your issue on network, to see if they can help you check your Nic conditions. Meanwhile, I suggest you to look for direct help from your Nic vendor, they should have the techniques to test nics itself (hardware condtions).

Thanks for your support! And I would appreciate it if you could help me Accept Answer to support my work.

Best regards
Joan


If the Answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi! Thanks for the response. The issue I'm concerned with is not the two events themselves. I'm aware those are normal. The concern is the rate these events are coming through. Every 1-15 minutes for days until I reset the NIC. Otherwise, the server is connected to the network fine. It just seems to be added to and dropped from the cluster again and again.

But I agree, it does seem like a probable NIC issue. However, the NIC in question is virtual. So would I contact VMware in this case?

0 Votes 0 ·

Hi,

Thanks for your reply!

Yes, I can understand. If your vnic belongs to Vmware, I think you should firstly contact VMware supporter, to further troubleshoot your virtual nic issue. Meanwhile, I will discuss your case with our engineers from Net forum. Above all, we can firstly identify whether this issue is really related with network configuration (like vnic that we have discussed). If this issue is not related to vnic and if it is concerned with cluster configuration, you can come back for help and we will always be on your side. : )

Thanks for your support! Would you mind help me Accept Answer to support my work? Besides, in this way, your blog can be put on top of our forum, helping people who have a similar issue to get to the workaround more quickly.

BR,
Joan

0 Votes 0 ·