Striped volume bottleneck

GMPeet 101 Reputation points
2021-05-07T09:44:01.397+00:00

I have a system with 4 non OS additional 1TB NVMe's PCIe 4*4gen.
When I measure sequential read and write I get:

Individual

Performance 6.7 GB/s read and 5 GB/s write.

2 drives MS Windows 10 striped volume
Approx. 12 GB/s read and 10GB/s write.
So far so good.

4 drives MS Windows 10 striped volume
Using 4 NVMe drives I get no more than

approx 10 GB/s seq read and 10 GB/s seq write.

The drives are on there own PCI Express root complex and all running at 4 PCIe lanes *16GT/s

Using taskmanager - performance, I can see that the individual drive performance in this striped volume is approx 2.5 GB/s r/w and is way below the max individual performance of 6.7 GB/s read and 5 GB/s write the drives are capable of.

I understand there is some overhead to take into account for the stiped volume but I expected results in the range of 17-20 GB/s.

(The same drives in an AMD RaidXpert2 RAID 0 array perform 12GB/s read 17GB/s write. That read is significantly lower than write also raises some questions)

Does anyone have an idea what could be the bottleneck that is holding back the 4 drives striped volume from achieving 17-20GB/s

Not Monitored
Not Monitored
Tag not monitored by Microsoft.
36,062 questions
0 comments No comments
{count} votes

Accepted answer
  1. GMPeet 101 Reputation points
    2021-05-14T22:36:31.66+00:00

    I got over 27GB/s seq. read on a MS Striped volume of 4 gen 4 NVMe's by adjusting the benchmark conditions.

    The bottleneck appears to be the use of 1 thread (Logical CPU) on my Threadripper pro system.
    Increasing the number of threads to 4 (of 32 available on my system) already give the expected read performance.

    But this is a max performance benchmark test. Not a realworld test.
    The realworld tests show the speed, available in normal use, for this volume at approx max 10 GB/s seq read

    It looks like I have to do some additional testing to further figure this out.

    The benchmark tests are run on a Windows 10 professional system. (see my comment for additional benchmarks)

    To me it shows my system has the potential to perform.
    But to benefit from it, the operating system's file system has to be capable of multi threading file handling so applications can actually read and write at these speeds.

    What interests me is if p.e. Windows server has multi threaded file handling and could make the high max performance, realworld values.

    97302-20210517-ms-striped-q8t4-crystaldiskmark-202105172.png

    1 person found this answer helpful.

2 additional answers

Sort by: Most helpful
  1. Catherine Piao 11 Reputation points
    2021-05-10T07:04:20.963+00:00

    95018-image.png

    0 comments No comments

  2. GMPeet 101 Reputation points
    2021-05-10T15:29:59.187+00:00

    Hi,
    I was redirected to here by a Microsoft Community - Moderator
    Just hoped I was not the first to try this kind of (4 * PCIe gen 4) striped volume on a Threadripper pro system.

    I do have something to add.

    If I change the test conditions a little by increasing the file size used for the test, the 4 * NVMe striped volume also reaches a little above 12 GB/s.
    So about the same as the 2 * NVMe striped volume and the relative low read of the AMD RAID.
    Still far from what I expected. But the limit appears to be around 12 GB/s for now.

    I understand it is a little complicated question for several reasons.
    If it was an easy one I would be able to figure it out on my own and did not have to post it. :-)

    I am Windows Insider. Maybe I get an input over that channel?

    Thank you for your quick response.