# Reducing Memory Usage to Increase Concurrency

There are multiple ways for a computer to solve permutation problems. In an exercise to show the performance of different solutions, I created a couple of different ways to solve the problem of knowing if a certain sequence could be transformed to another sequence given a set of legal permutations. All of my little puzzle solving applications take a sequence of numbers 0 to 8, and see if following valid permutation rules, the given sequence can be ordered. The amount of possible permutations for a sequence of nine numbers is 9! (362880). Depending upon what the valid permutations are, it can be impossible to transform one sequence to another sequence. The way my applications would solve this is by calculating all possible legal permutations of the given sequence. What makes this problem eligible for a concurrent solution is that a sequence may be able to be transformed into multiple other sequences at any given step, and all of those sequences can be analyzed independent of each other (i.e. concurrently).

For some of the “solutions”, the bottlenecks were practically intentional. I would run the application under the Concurrency Visualizer, look at places where a thread was blocked, and sure enough, it was blocked on a lock that is in my application’s code. That was the purpose of the exercise though: to practice analyzing performance sessions to find the application’s bottle-necks. The Concurrency Visualizer did help me find a contention which I wasn’t aware of.

One of my solutions was practically lock free. The application did very little work inside of the critical sections and should have efficiently utilized the CPU. For the most part it did. But the CPU utilization view of the Concurrency Visualizer showed that the application was spiking instead of exhibiting a near constant usage, which I would expect of a mostly lock free application. Looking at the Visualizer’s Threads view there was a recognizable pattern of having all of the threads, but one, stuck in contention and all waiting on the single running thread. I wasn’t too surprised, but I checked the Current stack tab of the blocked threads to double check that the critical section was one in the applications code. It never was.

All of the blocked critical sections looked something like

clr.dll!CLREvent::WaitEx

clr.dll!WKS::gc_heap::wait_for_gc_done

clr.dll!WKS::gc_heap::try_allocate_more_space

clr.dll!WKS::GCHeap::Alloc

clr.dll!AllocateArrayEx

clr.dll!JIT_NewArr1

mscorlib.dll!System.Collections.Generic.List`1[System.__Canon]..ctor

numberpuzzle.dll!NumberPuzzle.NumberPuzzleBase.EvaluateState

Okay, so Garbage Collector has a lock. There’s no surprise there, the GC has to manage memory. The frequency in which the GC performs operations is based off of the number of objects allocated, so perhaps there might be an object being allocated that I’m not aware of which is causing an increase of GC work. To find out, I ran the Visual Studio performance wizard, but this time instead of selecting the option to “Visualize the behavior of a multithreaded application”, I chose the option to “Track managed memory allocation”. After a few runs I did discover something. The second most allocated object was the System.Thread.WaitCallback object. This was a discovery because the application code never created a WaitCallback object. But the profiler told me that on average, the application allocated 88373454 bytes with %12.8 of the bytes being WaitCallback objects. Thus, I couldn’t account for on eighth of the allocated objects.

All of those allocations were coming from one line: