question

VirtualDude-9455 avatar image
0 Votes"
VirtualDude-9455 asked TanPolat-7970 commented

Storage Spaces Direct - all NVME BSOD - Proliant DL385 Gen10 Azure HCI Stack Certified Hardware

Hello Microsoft land,

Just wanted to throw this out there to see if it gets any hits or anyone in the community has had any similar experiences.

We are running some HPE Proliant DL385 Gen10 Servers that are Azure stack HCI certified hardware solution.

We are using AMD Epyc CPUs with all HPE NVME SSD drives for a full NVME solution for S2D. Running Windows Server 2019 LTSC fully patched up with latest HPE firmware/drivers.

Unfortunately, we cannot get S2D to enable with BSOD during the process. When the SBL is surfacing the NVME disks it BSODs all cluster nodes immediately. BSOD dump analysis doesn't show a specific 3rd party driver causing this. Mainly either the NVMEfilter or the storport.sys kernel drivers crashing.

We even have a MS premier support case open now for about 3 months with no resolution yet. They say the there is a (IRP) I/O request packet getting corrupted when trying to reference an invalid memory location causing a crash with either storport.sys as being mentioned or IRQL_Less Than flags being thrown. We have been passed around the Mindtree limited MS support partners for some time now with no resolution. Even HPE engineers have no clue as to why it's not working.

We have followed all MS guides for S2D deployment and have been running S2D across 3 storage clusters for years now with no real issues.

Anyone running this platform seen this before? Thanks in advance for anyone out there.

windows-server-storage
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,

I would like to check if the reply could be of help? If yes, please help accept answer, so that others meet a similar issue can find useful information quickly. If you have any other concerns or questions, please feel free to feedback.

Best Regards,

Danny

0 Votes 0 ·

Did you ever find an answer to this? We're seeing exactly the same issue you describe on very similar hardware. Lenovo SR655 (also certified hardware), Epyc 73F3, all NVMe storage (10 x Intel P5500, 2 x P5800X). Happens on 2019 and 2022, we don't use Azure Stack. We also have an MS support case open, but it isn't getting anywhere. Lenovo also hasn't been any help (this box is also theoretically certified). We see a wide variety of STOP codes, stacks are usually in clusport or spaceport, seemingly in the midst of some kind of hardware enumeration.

0 Votes 0 ·
YuhanDeng-MSFT avatar image
0 Votes"
YuhanDeng-MSFT answered YuhanDeng-MSFT commented

Hi,
Not sure if this helps but maybe you should take a look:
https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/storage-spaces-states
Also similar cases:
https://answers.microsoft.com/en-us/windows/forum/all/getting-bsod-after-nvme-m2-drive-install/f5810ed7-4b7d-442b-bbea-166593577c23
https://www.overclock.net/threads/nvme-ssd-causing-bsod.1738844/

I did some research but due to the limit of forum scope,I can't find a resolution to it. If it's not a production envirionment, I would suggest that you try rebuilding it to fix this issue.

Thanks for your understanding.
Best regards,
Danny


If the Answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Danny,

Thanks for looking around into this. Unfortunately I don't think these are relevant to my case. This is a new environment in which we have 8 of these new servers stacked out with NVME disks in which none of them work.

I tried rebuilding servers many times with different versions of Windows Server. Server 2016, Server 2019 1809, 1909, 20H2 and even Server 2022 preview with no love. Same issue every time. As soon as we run the command "Enable-ClusterStorageSpacesDirect -Verbose", all is good until the disks start to get surfaced by the SBL. Then instantly all cluster nodes BSOD at the same time.

I hope I can get escalated out of Mindtree support and over to an MS engineer as I believe its something to do with the nvme filter driver that is baked into Windows. I see some of those articles that some people were using 3rd party nvme drivers. I don't think HPE has those for Windows Server from what I can tell.

0 Votes 0 ·

Hi,
This is rare. For deep investigation and better solution I would suggest that you try a Microsoft Support ticket. If the issue has been proved as system flaw, the consulting fee would be refund. You may find phone number for your region accordingly from the link below.
Global Customer Service phone numbers:
https://support.microsoft.com/en-us/help/13948/global-customer-service-phone-numbers

Thanks for your understanding.
Best regards,
Danny


If the Answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

0 Votes 0 ·

Danny,

Unfortunately as I said earlier we have had a MS premier support case open for months now with no resolution. We are a MS gold partner so we have access to premier support for Windows server products. Unfortunately not getting anywhere with them as they seem stumped. Hence, why I asked the community here.

0 Votes 0 ·
Show more comments
TanPolat-7970 avatar image
1 Vote"
TanPolat-7970 answered TanPolat-7970 commented

Hello, we finally found the solution and it was really really dumb. Install hyper-v first on all S2D nodes even if you are not going hyper converged

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Wow, thanks for replying. That does sound really silly, but I'm trying it right now.

0 Votes 0 ·

Wow - this worked. Your response here probably saved literal months of pain, I can't thank you enough.

0 Votes 0 ·

Glad I could help out! yeah I wish someone else had saved me the months of face bashing into the wall.

1 Vote 1 ·