Investigating Framework Performance on Itanium: Rudimentary Considerations
So let's begin the Performance conversation! The area I'd like to delve into for this post is the performance of managed code on Itaniums.
One of the first tasks I took on when I joined the Framework Performance team( as a Program Manager) was to continue to improve the performance of managed code on Intel Itanium architecture and cleanup our problem areas. This post summarizes some of the performance issues that we are investigating as we move on to the next major release of the .NET framework.
This has been made possible by a joint effort between Microsoft and other partners, one of them being the Itanium Solutions Alliance (ISA). Microsoft is a charter member of the ISA and we are excited to work with the other members of ISA, and to further enhance the performance of the .NET Framework on Intel Itanium Architecture.
Here is an outline of areas we're investigating for improved performance on Itanium architecture.
Evaluation and Optimization considerations
A large portion of our work list is investigating and evaluating both the existing JIT compiler and proposed changes. For example, global scheduling could potentially improve execution speed by reducing load stalls and making more instructions available for bundle scheduling. We plan to investigate what impact enabling a global scheduler would have on code performance and code correctness, especially garbage collection.
There are a number of small changes we are looking at that may have a big impact on code quality (size and/or execution speed). As in, the code to access values in a 8KB or larger stack frame was inefficient. Reversing the order of the stack (frame) pointer and offset allows us to improve this code.
One of the other problems we have identified is that we currently do not optimize for larger functions. This was done in an effort to improve compilation time. Unfortunately the side effect is to make these large functions even larger and slower. We are investigating several alternatives to improve the generated code quality without having as large an impact on compilation time.
Although optimizing loops is not important for every application, it is important for some of them. We are looking into solutions to enable loop unrolling or software pipelining , and if this can be done without too large an impact on compile time or code size.
We're always looking for feedback from our valued customers, so if you've experimented and/or have specifics of performance issues you've noticed on Itaniums, please do share your results.
This post was authored by Snesha Arumugam, a Program Manager for Framework Performance at Microsoft.