Exchange 2013/2016 JBOD Storage Validation Considerations
Senior Exchange PFE Nélio Lemos takes us though some important pre-deployment considerations for Exchange storage validation.
In this post let’s view why validating JBOD storage from your Exchange 2013/2016 on premises servers using the JetStress tool is a critical deployment task. We will review the prerequisites and the overall process.
The main reason is that we must validate the Exchange solution before placing it into production. This means testing the end to end solution to include servers, storage controllers and disks. For example, wouldn’t you rather detect a deficient disk, controller, memory, motherboard, etc. before placing your manager’s mailbox on to that impacted database? It is much better to put a finger on such issues in the test phase rather than in production - for both you and your career!
Just a Bunch Of Disks (JBOD) storage is the recommended storage solution according our preferred architecture which you can find on the Exchange team blog. JBOD has different meanings across the IT industry. In the context of Exchange it is just that, a bunch of disks. There is no grouped Redundant Array of Inexpensive Disks (RAID) configured such as RAID 5 or RAID 10. Each individual disk unit is presented by itself, the exact mechanics will vary depending up on the hardware purchased. See the note below on the storage controller, as that is the main reason why most Exchange JBOD designs may fail.
The validation process is required to ensure your JBOD disks performance will be enough to sustain the worst case IO requirements from your Exchange server. Servers must meet or exceed performance expectations when running under the worst case scenario, described in the Exchange 2013/2016 calculator. Yes, you do need to use this tool before running Jetstress :)
Some considerations to successfully running Jetstress:
- Use the correct version of the Exchange calculator. For this post, the Exchange 2013/2016 calculator will help you build your servers, in particular the storage and determine the required IO your solution must meet.
- Latest version can be found here, and is compatible with Exchange 2016
- Use latest Jetstress 2013 build. For now, the tool is the same for Exchange 2013 and Exchange 2016.
- Refer to the Jetstress documentation: this is mandatory !
- The account used must have local administrator rights on the servers being tested.
- All the Exchange volumes must be prepared that is to say formatted, mounted and folders database/logs created according to Exchange calculator tool and script preparation.
- Correctly configure the server’s disk controller.
You must configure your server’s disk controller to leverage its cache for all your JBOD disks. This is a critical point. Your servers require a disk controller with battery-backed cache for the disks to operate with best performance. You will need to work closely with your hardware provider to configure your the disks and storage controller accordingly. Microsoft cannot state exactly how they are best configured, that guidance must come from your vendor since different solutions vary in how they should be configured. Microsoft does provide the following high level guidance for configuration, though not all these options may be exposed on all storage controllers.
- Your servers require a RAID controller with battery backed cache
- Configure RAID controller cache for 100% write for all your JBOD disks.
- Use Disk array RAID stripe size of 256 or greater.
- Disable any antivirus during the tests.
As an example, some hardware provider’s solutions create a RAID 0 array for each disk to make them able to leverage the controller cache.
The storage validation process is the following for each server:
- Use autotuning test for 15 minutes to determine the threadcount value to use. That value will be configured on the following tests, and indicates to Jetstress the IO load to send to the disks.
- Then, start a 2 hours test using the threadcount determined in autotuning test. You will then check the test result in particular 3 values:
- DB IOPS Target: must be >= to value from the Exchange calculator (Role requirements tab, field Total database required IOPS)
- DB Read Average Latency: must be < to 20 ms
- LOG Write Average Latency: must be < to 10 ms
- Finally, but not least, a third 24 hours test still using the same threadcount value. The test result check is the same as the 2 hours test.
If your servers pass the 2 hours and 24 hours tests then their JBOD storage would be validated for production. Do not skip the longer burn-in test. It is necessary to fully test the storage and fully exercise all caches in the solution. This will take time, and hence is the reason for the longer test. We have seen many instances where the 2 hour test passes, but a 24 hour test fails. Test failures need to be diagnosed in conjunction with the hardware vendor to see where the solution failed to meet expectations.
You have seen that JBOD storage validation is very important before going to production. The process to do so should now be clear. Please do leave a comment below if you have any questions.
Published by MSPFE editor Rhoderick Milne