October 2009

Volume 24 Number 10

CLR Inside Out - Profiling the .NET Garbage-Collected Heap

By S. Ramaswamy, V. Morrison | October 2009

In the MSDN Magazine June 2009 article "Memory Usage Auditing for .NET Applications", we discussed monitoring memory consumption using tools like Task Manager, PerfMon and VADump. These tools help in monitoring the overall memory consumption of your application. Typically, if a .NET application is consuming a large amount of memory, it is either because the application loads a large number of DLLs or the application is creating a large number of long-lived objects on the .NET garbage-collected (GC) heap. If an application loads many DLLs, the only recourse (other than to avoid unnecessary dependencies) is to run less code. However, if an application suffers from a large GC heap footprint, one could identify that the GC heap was contributing significantly to memory consumption, as discussed in the June article. In this article, we complete the picture for GC-heap-related memory issues by providing step-by-step instructions on using the CLR Profiler for .NET GC heap memory investigations.

Recap: Audit All Applications

The .NET runtime abstracts certain operations (such as allocating memory or using a large class library), and it is not always simple to foresee the impact of these operations on application performance. This is especially true for memory. The important takeaway point in the last article was that every application deserves a periodic memory audit. This article goes through this step-by-step using a real program as our example (as opposed to a demo). This was a simple program, and we could have chosen not to audit the application’s memory usage. However, when we completed a memory audit, the small effort of performing the audit paid off handsomely, as we found significant memory inefficiencies.

The program we use in this article is called XMLView, which is a simple viewer for XML files that is highly scalable. XMLView can easily handle XML files that are gigabytes in size (try that in IE!) because this application was designed to be memory efficient from the outset. Despite the careful design, in some scenarios, tools like Task Manager showed that the application was using significantly more memory than we would have expected. This article walks you through the investigation that uncovered a performance bug.

Big-Picture Memory Usage

The first step in a memory audit is to launch the application and look at the OS task manager to check the application’s overall memory usage. As we described previously, the metric that is most important is the "Memory - Private Working Set" column. This refers to the application memory that cannot be shared with other processes running on the machine (see previous article for details).  

Once Task Manager is open, you can monitor the private memory usage of your application in real time, run through various user scenarios and observe any impact to memory consumption. There are two anomalies that warrant your attention. One is memory consumption that is disproportionate to the scenario or task being performed. For example, opening a menu entry or performing simple operations should not cause large spikes in memory consumption. Memory leaks are the second type of anomaly that necessitates attention. For some operations, such as opening a new file, the expectation is that new data structures will be created each time the operations is performed. However, many operations (for example, searches) do not logically allocate new long-lived memory. It is not unusual for these stateless operations to allocate memory the first time they are performed (as lazy data structures are populated). But if memory usage continues to increase with subsequent iterations of the scenario, memory is leaking and needs to be investigated. This technique of using Task Manager is by design very coarse (only large memory problems will be evident), but for many applications, only the large ones are worth pursuing.   

For XMLView, the scenario is launching the viewer on a large XML file and performing various navigation operations, such as opening tree nodes or searching. This experiment initially turned up no leaks (the memory usage stabilized after a while; see Figure 1). Memory usage spiked rather substantially (8MB) when the search functionality was used (see Figure 2) from 16812KB private working set to 25648KB. This represents one of the anomalies we were looking for: the resource consumption did not fit our mental model of the program’s memory behavior, because search has no auxiliary data structures and thus should not consume extra memory. We had a mystery to solve. 


Figure 1 XMLView Memory Usage in Task Manager Before Search Operation


Figure 2 XMLView Memory Usage in Task Manager After Search Operation

CLR Profiler

A number of third-party tools are available for doing analysis of the .NET GC heap; however, we will concentrate on CLR Profiler, which has the advantage of being available as a free Web download from Microsoft. A Web search on "CLR Profiler" will locate the download site, and the installation consists simply of unzipping the files (there is no formal installation step). Included as part of the package, the file CLR Profiler.doc provides the documentation. In this article, we will focus on the parts that are used commonly. 

How Does CLR Profiler Work?

In order to use a profiler properly, you need to understand how the data was collected and the limitations of the tool. The CLR Profiler uses a special interface to the runtime that allows it to get callbacks when particular events happen within the runtime (msdn.microsoft.com/library/ms404511.aspx). Currently, this interface can be used only if the process being profiled is started with certain environment variables set that inform the runtime regarding the contact mechanism to the profiler. Thus, there are some limitations, such as the inability to attach to an existing process. CLR Profiler must start the process with the right environment variables set. For services that are typically started by the OS directly, like ASP.NET, special procedures have to be followed (see CLR Profiler.doc from the download for details). As an aside, for certain scenarios we have added API support for attach in the .NET Framework 4 (but only the API support; support for attaching the CLR Profiler tool will follow soon after).

When CLR Profiler starts up the application, it provides options on the type of information it collects.  By default, it gets callbacks every time a managed method is called or returns from a call, as well as on every GC heap allocation. These callbacks are called often, resulting in the program slowing down substantially and creating large log files more than 100 MB) for many real-world scenarios. CLR Profiler does not measure execution time and limits itself to keeping track of only GC heap allocations and the number of times each different method was called. CLR Profiler uses the method-call and method-return events to keep track of what methods are currently active on the stack, so that each call and heap allocation can be attributed with a complete call stack of where the event happened. This identifies where the objects are being created.

Unfortunately, the detailed information that CLR Profiler collects by default can make the data collection process very slow (minutes). However, this detailed information is rarely needed. One of the most useful techniques does not require any additional information to be collected during normal execution. While it is important to know where memory allocation occurs, many bugs can be fixed only by understanding who is keeping memory alive. To answer this latter question, one does not need call stacks or allocation events; rather, only a simple snapshot of the GC heap is required. Thus, for most investigations, one can turn off all normal event logging and obtain a good experience (reasonable profiling performance as well as identifying the issue). This is the technique used here.

Using CLR Profiler

If you download CLR Profiler from the Web and extract it to the default location, the program will end up in C:\CLR Profiler\Binaries\x86\ClrProfiler.exe (there is an X64 version, too). Launching CLR Profiler brings up the form shown in Figure 3.


Figure 3 CLR Profiler Start-up Screen

This start-up dialog controls the collection of profile data. The check boxes indicate that when the application is executed, profiling will be active and both allocation and call data will be collected (the log file will be verbose). Since we are interested in only heap snapshots, only the Profiling active box has to remain selected (the Allocations and Calls boxes should be unchecked). Thus, the CLR Profiler will log only the bare minimum during program execution, keeping responsiveness up and log-file sizes down.

If the application you wish to profile does not need any command-line arguments or special start-up directory, you can simply select Start Application and CLR Profiler will bring up a file chooser dialog box to select the application to run. If you need to set command-line parameters or the current directory for the application, you should set them using the File -> Set Parameters menu before clicking on Start Application.

Taking a Heap Snapshot

After selecting the application executable, CLR Profiler will immediately start running the application.  Because we have turned off allocation and call profiling, it should run very close to its normal speed. CLR Profiler is now attached to the application, and at any point you can select Show Heap Now and get a heap snapshot. The heap snapshot for the XMLView application happens to look like what is shown in Figure 4.

This is a representation of the entire GC heap (only live objects are shown). While the GC heap is actually an arbitrary graph, CLR Profiler determines links and forms a tree out of it, rooted in a pseudo-node called root. The size of each of the children can then be attributed to their parents so that at the root, all live objects have been accounted for. In the example above, the GC heap had 5.4MB of objects. This 5.4MB does not include free space in the GC heap that has not been allocated, and therefore, the amount of virtual memory allocated for the heap is guaranteed to be larger (the exact size depends, but  as mentioned in the previous article, 1.6 times larger is a good estimate).   

Often, there is more than a single path from the root to a particular object. CLR Profiler always chooses the shortest path for display in the tree. However, if there are multiple paths with the same length, the choice is made arbitrarily. Additionally, by default, CLR Profiler does not display individual objects but treats all objects that have the same parent types and the same child types as part of the same node. This technique generally does a good job of collapsing linked lists nodes and other recursive data structures, thereby simplifying the view. If this aggregation is not desired, you can turn it off by right-clicking and selecting Show Individual Instances. 

At the top of the Heap Graph window are two radio button groups. The first is the Scale, which indicates the height of each box in the graph view. The second is the Detail button, which determines how large a node has to be to get displayed. It is useful to set this to something very coarse, to keep the graph uncluttered. Often, you are interested in only the graph that has something to do with a particular type (for example, only about objects of type XmlPositions). CLR Profiler allows you to filter on this and other criteria (Right click -> Filter).

In the XMLView example in Figure 4, the bulk of the GC heap (5.1MB) is a child of an instance of XmlView type. The XmlView instance has a child of type XmlPositions, which in turn has a 5MB XmlElementPositionInfo array. XmlElementPositionInfo is the large array that keeps track of the starting point in the file of every XML tag, and its size is proportional to the size of the XML file being read (because this is a very large file, it is expected that this array will be large). Thus, we concluded the heap size is reasonable.


Figure 4 CLR Profiler Snapshot Before Search Operation

Tracking Down GC Heap Growth

In the XMLView example, our problem was not with the initial size of the GC heap, but the fact that it grew dramatically when a search was done. Thus, what we are interested in is the change in the heap between two snapshots. CLR Profiler has special features just for doing this kind of investigation. 

After taking one heap snapshot, we can then perform the search operation (causing the additional memory to be allocated), and then take another heap snapshot. The result is the following display:

The memory associated with the XmlView object has now expanded to 8.5MB (from 5.1MB). The nodes have two colors, with the lighter red representing the memory that was present in the previous snapshot. The darker red is the memory that is unique to the new snapshot. Thus, the new memory was added by a type called XmlFind (this is not surprising), as can be seen in Figure 5. Scrolling over to the right, we find that all of this extra memory was associated with an Int64 array that belonged to the type LineStream, as shown in Figure 6.

This was the information needed to identify the bug!  It turns out that the .NET XML parser has the capability to get the line number column number of an XML tag, but not its file position. To work around this, the program needed a mapping from line number to file position. The Int64 array was used for this. However, the code needed this mapping only for the current XML element tag, which might span any number of lines, but usually just a few. This mapping was not being reset by the find code, allowing the mapping array to grow unnecessarily large. The fix was trivial, but this bug would probably have existed forever if the memory audit was not performed. Using the CLR Profiler, it took only a few minutes to find the information that was critical to fixing this bug.


Figure 5 CLR Profiler Snapshot After Search Operation - Part 1


Figure 6 CLR Profiler Snapshot After Search Operation - Part 2

Displaying the New Objects Alone

In the previous example, it was relatively easy to visually distinguish by their color the objects unique to the last snapshot. However, for large object graphs, this is difficult to do and it would be convenient to remove the old objects from the graph altogether, which can be done using CLR Profiler.  This feature uses the allocation events for implementation, so after taking the first snapshot, you need to check the Allocations box (you can leave the Calls box unchecked). Following this, interact with the application until it allocates more memory, and select Show Heap Now again. It will show the heap with its two colors of red as before. Now, you can right-click on the window and select Show New Objects, and a new window will appear that shows you only the part of the GC heap that was allocated between the two snapshots. 

Tracking Down All Roots for an Object

You might wonder how memory leaks can happen in managed code, given the garbage collector eventually cleans up all unused memory. However, there are situations where items can stay alive long after you might think they would be freed. For example, unintended references may keep managed objects alive.

The technique for tracking down memory leaks is very similar to that for understanding disproportionate memory consumption. The only real difference is that for disproportionate memory consumption, we do expect the object to be in memory (it just should not be as big), whereas for leaks, we do not expect the object to be in the GC heap at all. Using heap snapshots and the Show New Objects feature, it is easy to identify which objects should not be alive (using your understanding of the scenario). If an object is surviving in the heap, there must be some reference to the object. However, if there is more than a single reference, you may find removing one link does not fix the issue because another reference is keeping it alive. Thus, it would be convenient to find all paths from the root that keep a particular object alive in the GC heap. CLR Profiler has special support for doing this. First, the node of interest is selected. Then right-click and select Show References, and a new graph will appear that shows all paths from the root to the object in question. This will provide you with enough information to identify which references are keeping the objects alive in the heap. Eliminating the links to the object can fix your memory leak.

Getting Allocation Call Stacks

Viewing the GC heap is frequently sufficient to diagnose most high memory-consumption problems. However, on occasion, it is useful to understand the call stack where an allocation occurred. To obtain that information, you have to select the Allocations check box before executing the program. This causes CLR Profiler to log the stack trace whenever an allocation happens. Collecting this information will make the log file bigger, but it will still typically be less than 50MB for moderately complex applications. You should not turn on the Calls check box because this causes logging on every method entry, which is verbose (it is easy to get log files larger than 100MB). This method call information is needed only if you are using CLR Profiler to determine which methods are being called often, and this is not typically useful in a memory investigation.

From the heap graph, select a node of interest, right-click and select Show Who Allocated. A new window showing all calls stacks that allocated an object in that node will be displayed. You can also access the stack traces from the View -> Allocation Graph menu item. This shows all allocations made during the program attributed to the call stack that allocated them. For memory investigations, this view is not very useful because many of these allocations are for short-lived objects that quickly get reclaimed by the GC and, therefore, do not contribute to overall memory consumption. However, since these allocations do contribute to CPU usage (to allocate and then reclaim), this view is useful when your application is CPU-bound and the CPU profiler shows large amounts of time being spent in the GC. The obvious way to trim down time in the GC is to do fewer allocations, and this view shows you where your allocations are by call stack.

Part of the Development Lifecycle

In this article, we walked through a memory performance investigation for the common case where the memory consumption is dominated by the .NET GC heap. Using the free and downloadable CLR Profiler, we were easily able to get snapshots of the GC heap at various points in the application and compare the snapshots with one another. With CLR Profiler, it is easy to perform audits (it takes only a few minutes) and should be part of the development lifecycle of every application.        


Post your questions and comments on the CLR Team Blog at http://blogs.msdn.com/clrteam/archive/tags/CLR+Inside+Out/default.aspx.

 

Subramanian Ramaswamy* is the program manager for CLR Performance at Microsoft. He holds a Ph.D. in electrical and computer engineering from the Georgia Institute of Technology.*

Vance Morrison is the partner architect and group manager for CLR Performance at Microsoft. He drove the design of the .NET Intermediate Language and has been involved with .NET since its inception.