Cluster High CPU usage S2D FSLogix storage

John Kay 21 Reputation points
2021-09-27T13:59:17.09+00:00

We have a two node Server 2019 cluster running the Scale Out File Server role for Storage Spaces direct. We use this to host our FSLogix Profiles for Remote Desktop Services. The cluster runs nothing else and is on the latest updates.

After about 30 users are signed into RDS the cluster CPU seems to spike the System service and will spike to 100% and sometimes stay up there for a while. We've disabled all Antivirus, uninstalled Defender firewall, updated all drivers and firmware, and rebooted multiple times. When the issue first started to occur we thought it might be our disk IO so we put in double the disk and went from 6Gb Sata intel DC SSD drives to 12Gb SAS SSD drives with way better performance. Issue still occurs. The strange thing is we had an error about the max envelope size being hit so we increased it in the registry, this fixed our issue for about 10 days, then suddenly it came back. This time we don't see any errors like we did before about the max envelope size being exceeded. The cpu in each server is a Intel E5-2670V3 2.3 ghz 12 core CPU. Since this is only a file server I would assume this would be plenty. Before this all started happening we would have about 85 users signed into RDS at once with 1-3% CPU usage on the cluster. Now with about 30 users the average CPU is around 30% with spikes to 100% and sometimes holding there.

Any suggestions?

Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
957 questions
FSLogix
FSLogix
A set of solutions that enhance, enable, and simplify non-persistent Windows computing environments and may also be used to create more portable computing sessions when using physical devices.
463 questions
0 comments No comments
{count} votes

4 answers

Sort by: Most helpful
  1. John Kay 21 Reputation points
    2021-09-27T14:27:22.21+00:00

    I am seeing this in the event viewer, not sure what this means though.

    The maximum file size for session "EventLog-Microsoft-Windows-SMBServer-Analytic" has been reached. As a result, events might be lost (not logged) to file "C:\Windows\System32\Winevt\Logs\Microsoft-Windows-SMBServer0x8800081Analytic.etl". The maximum files size is currently set to 1048576 bytes.

    0 comments No comments

  2. John Kay 21 Reputation points
    2021-09-27T19:08:35.633+00:00

    We removed the E5-2670V3 CPU from one of the nodes, then replaced the single CPU with 2x E5-2680V3 and moved the role over to it. Same issues, and it has maxed the CPU out on it as well.

    0 comments No comments

  3. John Kay 21 Reputation points
    2021-09-27T19:43:43.93+00:00

    We opened CMD and ran netstat -abo and received this.  TCP   [fe80::796b:41a:4ab5:d1ef%11]:445 StorageNode02:51278   ESTABLISHED   4 Can not obtain ownership information  TCP   [fe80::796b:41a:4ab5:d1ef%11]:445 StorageNode02:51955   ESTABLISHED   4 Can not obtain ownership information  TCP   [fe80::796b:41a:4ab5:d1ef%11]:49696 StorageNode02:ms-cluster-net ESTABLISHED   5080 [clussvc.exe]  TCP   [fe80::796b:41a:4ab5:d1ef%11]:49699 StorageNode02:microsoft-ds ESTABLISHED   4 Can not obtain ownership information  TCP   [fe80::796b:41a:4ab5:d1ef%11]:49700 StorageNode02:microsoft-ds ESTABLISHED   4 Can not obtain ownership informationThe PID is 4 which is the service that is maxing our CPU out.Any suggestions?

    0 comments No comments

  4. Limitless Technology 39,351 Reputation points
    2021-09-30T08:49:52.867+00:00

    Hello,

    1. Please use process explorer and procmon to monitor which process is being consumed high CPU.
    2. Please check what kind of applications are used 30 users, are they developers or just taking rdp for Office applications.
    3. Please use disk iops tool to monitor read/write on disks.
    4. I believe CPU spikes for momentary is acceptable in RDS environment as they logon and logoff frequently which loads and unloads their user profile in to RAM.
    5. Please check below Microsoft article for further tuning for RDS environment.

    https://learn.microsoft.com/en-us/windows-server/administration/performance-tuning/role/remote-desktop/session-hosts

    --------------------------------------------------------------------------------------------------------------------------

    If the reply was helpful, please don’t forget to upvote or accept as answer.

    0 comments No comments