Improving Startup Time: A primer on setting base-addresses for managed DLLs
Startup time is one of the most perceptible performance issues for any application. In this post, we’ll talk about an easy way to improve startup time in many cases. In particular, if you are building an application with managed DLLs, and your application is deployed on pre-Vista OS’es, this post will teach you how to make your application start faster, by setting base addresses.
What are base-addresses, and why is setting base-addresses important?
Let us assume that you are loading two executable DLLs (or images) sample1.DLL and sample2.DLL which contain many methods and references to methods. The references are generally absolute memory addresses. These absolute memory addresses within the images are only valid when the executable file is loaded at a preferred virtual address in memory. The preferred base-address for images is something that developers can set.
Let us examine what happens if we don’t set the preferred base-address for the images. The framework defaults the preferred base-addresses to 0x00400000 if you don’t set it explicitly. Thus, both images (sample1.DLL and sample2.DLL) will have the same preferred base-address 0x00400000. Since both images cannot be loaded at the same virtual address, one of the images will have to be rebased; that image will be loaded wherever the OS loader deems fit. Rebasing is expensive because the loader has to now update all the absolute addresses within the image to the new absolute addresses based on where the image actually got loaded. Many pages in the image contain such absolute addresses and thus need to be read and then written to with the new address. As if this weren’t bad enough, all the pages that were written to can no longer be shared with other processes that have loaded the same DLL. This translates to reading in a lot more disk data when the DLL is loaded (whether it is needed or not).
When should you set preferred base-addresses for managed DLLs?
You should be setting image base-addresses if all the following conditions below are true:
1) You have a substantial client base on Pre-vista OSes (Vista+ OSes do the necessary relocations at page fault time and thus there is no additional cost if you do not set base-addresses).
2) You create DLLs (exes are always the first in the process).
3) You NGen your DLLs (IL DLLs do not have absolute addresses that need to be fixed up) OR you have a mixed managed-native DLL.
4) You are concerned about startup time (if startup time is not a big factor, it may not be beneficial to NGen either; however, NGening can enable cross process sharing and lower total working set).
5) Your DLLs are large in size (if your DLLs are below 100K in size, there is not much benefit)
Why is setting base-addresses on Vista+ OSes unimportant?
While setting base-addresses is extremely important in Windows XP, in Vista and higher versions of the Windows Operating System, ASLR (Address Space Layout Randomization) moves the image locations randomly for increased security (and also does this efficiently by performing the necessary relocations at page fault time rather than at DLL load time; additionally with ASLR, while the final location is random per machine, it is the same for each process on the machine, which means the relocated data can be shared across all processes). There may be some marginal cases where setting base addresses in Vista+ OSes has a benefit, but these can be largely ignored.
Why is setting base-addresses on IL DLLs unimportant?
While native code tends to have many absolute addresses, IL code has effectively none. Thus, even if an IL DLL does not get a preferred base address, it does not cause any relocation overhead. If you are not using NGen, your DLLs are IL only – but if you care about startup time, particularly warm startup, NGen is highly recommended (A good introduction to NGen can be found at: https://msdn.microsoft.com/en-us/magazine/cc163610.aspx)
How does one set base-addresses?
In Visual Studio, you can set the base-address option from the advanced tab in the project properties by selecting the advanced button.
What should one pick as base-addresses?
The requirement is that no two DLLs overlap. Any scheme that accomplishes this is sufficient. One simple way of achieving this is picking a spot that system DLLs do not use (0x40000000 is a good spot; this is different from the default base address 0x00400000 I mentioned earlier) and assigning each DLL a range so that they are packed together (typically it is good to allow 20% for DLL growth). Finally, check that you succeeded using the procedure explained in the next question.
For a mixed-mode assembly (an assembly containing both native and managed code), both the IL and the NGen image may have to be loaded into memory. For such assemblies, NGen selects the base address of the NGen image to be equal to the base address of the MSIL assembly plus the size of the MSIL assembly plus some padding to round up to the nearest page boundary. NGen images can be as large as 3X times the size of an IL image, so ensure that the base-addresses are chosen appropriately.
Why (and how) should I check for relocations if I set base-addresses?
Even if you diligently set base-addresses, if there are base-address conflicts, performance will be impacted as expected. This can happen for a number of reasons, including DLLs or COM components you didn’t think about that get loaded into your process. Checking for relocations can help identify collisions and then eliminate them. If your application is hurting for virtual address space, base-address collisions cannot be avoided. You will have to make a determination on which DLLs will collide in that case - choose the smallest DLLs to lower performance impact.
You can check relocations using the Sysinternals tool – Process Explorer. After starting Process Explorer, click on Options –> Configure Highlighting, and check the ‘Relocated DLLs’ at the bottom (it is off by default). Select the DLL view in the lower pane (View à Lower Pane à DLL View) and when you select a process name, the DLLs loaded in that process in the lower pane and those DLLs that are being rebased will show up highlighted.
You can download Process Explorer from: https://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
Alternatively, you can use the command line tool ListDLLs to check for base-address collisions; ListDLLs can be obtained from: https://technet.microsoft.com/en-us/sysinternals/bb896656.aspx
If you want to check the preferred base address of a binary without launching the process, you can use the "dumpbin /headers" command shipped with the SDK. The "image base" data is the preferred base address.
Overall, you can see that this is generally a simple process – identify if setting base-addresses matter to you and if they do set the addresses, check for collisions and fix them if found. Despite the simplicity, setting base addresses can have a significant impact on your application if the conditions mentioned in this article apply.