(VS2010 Beta1) Native Parallel Programming: ConcRT example -- Debugging TwentyFour
In a previous posting, we presented a simple C++ parallel program TwentyFour, which is based on the new native parallel programming API (ConcRT) introduced in Visual Studio 2010. In this posting, we're going to discuss on how understand the running of native parallel programs using Visual Studio 2010.
With Concrt programming, you divide your code logic into little units of work called chores and pass them to the Concrt runtime through scheduling calls. Once the magic Concrt runtime decides it's time to run a certain chore on a certain thread, it will calls its callback function you give through Concurrency::Chore::m_pFunction pointer. So a good place to start understand how ConcRT works is set a breakpoint inside the callback function after the input paramter is cast into a derived class object:
Once the breakpoint hits, you can use Visual Studio normal debugging features to get a picture of what the ConcRT runtime has been doing behind our back.
First of all, the improved thread window shows that four threads have been created (on a 4-core machine):
The four threads beside the main thread are all created by the Concrt runtime running code in the Concurrency::details name space. They're in various stages of scheduling, synchronizing, and running chores.
If you open the stack window, you will find who is really calling our CCards::Payload method:
CCards::Payload is called by UnrealizedChore::_StructuredChoreWrapper, which finally leads to Concurrency::details::ThreadProxy::ThreadProxyMain routine in msvcr100d.dll.
If you open the variable window, and click on a few places to expand relevant objects, you will get a clearer picture of how chores are represented in ConcRT runtime:
The current CCards object we're looking at represents one branch of the first level of breaking down the problem (m_count=3). The m_expr string remembers the first operation performed(5+6=11). The CCards class derives from Concurrency::UnrealizedChore class, which in turn derives from the topmost Concurrency::Chore class.
We've set Chore::m_pFunction to point to our callback function CChards::Payload, but Concrt has another wrapper function _StructuredChoreWrapper around it to manage all structured chores. The m_fRuntimeOwnsLifetime is set to true which means the runtime will delete this object once it's done with it. The _M_pTaskCollection pointer apparently points back to the task collection this chore has been scheduled on.
With the standard debugging features of Visual Studio, we can definitely debug ConcRT-based parallel programs. We can look at different tasks running on multiple threads, or even multiple tasks running on the same thread one by one if we can remember everything in our limited mind whose short term memory is found to be only capable of holding 5 or 7 items are any more moment. But certainly it will be hard to find out anything about scheduled tasks, for they're complely hidden in the internal data structure of Concrt. With tradiaonal debugging features, literally we're only seeing individual trees without seeing the whole forest when debugging parallel programs.
This calls for new debugging features specially designed to support parallel programming, features whose implementation has unique insight into the working of parallel runtime and whose UI presentation can help us understand the totality of the parallel world and yet let us dig deeper if we choose to do so.
The first such support is the 'Parallel Tasks' window introduced in Visual Studio 2010. Here is what the parallel tasks window will show when our breakpoint is hit the first time:
The Parallel Tasks window shows, at least attempts to show, all tasks currently managed by the parallel runtime (Concrt in the native case). For each task, it shows its task identifier, status, location (top most method), a name or description of the task, thread assignment, and task group information. The Parallel Tasks window only shows task flagging status, and current/breaking task icons.
When the breakpoint is first hit, there are only two tasks, one running and another scheduled to be run. When you let the program run and check back the tasks window after a few more breakpoints, we have lot more tasks:
The picture above shows 2 running tasks, 3 waiting tasks, and 46 scheduled tasks. The running tasks are towards the leave nodes of our search tree (m_count=1), while the waiting tasks are towards the top of the search tree (m_cont=2). The waiting tasks are actually waiting in StructureTaskCollection::Wait on the tasks scheduled on the task collection created within the task code. In this case, the task collection is created on the stack within the Solve method, which is the main body of task.
For Concrt native parallel programming, task identifiers are actually their chore object addresses. The number showing in the Task Groups column are actually addresses of task collections. Both those objects have virtual function tables, so it's possible to add them to watch window to let Visual Studio decode their real structures. The expression to use is "(IUnknown *)<addr>". Here is an example:
We've seen the decoding of UnrealizedChore before, so only StructuredChoreCollection is new here. The decoded data shows this task collection still have 12 chores who still have not been running or not finished running.
If you short the Tasks window by the thread assignment column, you will find multiple tasks would be assigned to the same thread. This happens when a task finishes its own execution and waits for its child tasks in StructuredTaskCollection::Wait or other similar methods. The parallel runtime would pick another scheduled chore for execution, and this could be nested multiple times. Here is a sample stack:
This one shows three tasks are on the same thread. The top-most task is really executing its own code in the Payload method, while the other two are waiting in StructuredTaskCollection::Wait method.
The second new parallel debugging window introduced in Visual Studio 2010 is the parallel stacks window, a window which helps you to understand all threads/tasks running in the parallel runtime in a single view.
The parallel stacks window has two panels which can be toggled from its toolbar. The thread panel tries to show all threads in the process in a meaningful way. It achieves this by merging common sections of stack frames into frame nodes with one or multiple frames. When thread stacks differ along the way, child frame nodes are created and linked back to parent frame nodes, thus forming a forest (multiple trees in general). Here is an example:
The thread panel shows that there are 7 threads in the system, one main thread, two threads apparently running our tasks, one ConcRT resource manager thread, one ConcRT scheduling thread, and other threads in unknown places. The last two threads are actually normal ConcRT working threads. They're just in strange places that the debugger can't decode their stacks properly.
The highlighted edges and lines show the current line of execution, from which you should be able to identify a few running and waiting tasks (looking for UnstructuredChoreWrapper here). Actually, the tasks panel of the parallel tasks window just shows that exactly, stack frames of running/waiting tasks:
The diagram shows 5 tasks: one in CChads::Payload, one in CCards::Oper, and three in StructuredTaskCollection::Wait. You can hoover around the frame nodes to see their identifiers, stack frames. You can click on them to expand them to individual threads and frames.
There are lots of ways of interacting with both the tasks window and the stack window to help you navigate the complicated web of parallel tasks. For more details, please refer to our PM Daniel Moth's posting on his blog and videos on Channel 9.