Introduction to Remote Kernel Tracker
When you connect to a device with Remote Kernel Tracker, or when you open a log file that has been collected by one of the other CeLog tools, you’ll see a lot of data you may not understand. At MEDC this past week, a few people asked for a better explanation of what they’re seeing. So let’s go over it. Screen Shot 1 shows a sample of what you might typically see when you connect Remote Kernel Tracker to a device:
(scroll down if you can't see anything)
Screen Shot 1: Typical idle behavior, no processes expanded.
Someone asked me what all the information in the key means. I’ll go through them a bit at a time. First, the thread and process state lines.
You can see examples of the process state lines in Screen Shot 1 above. Each process in the system has a horizontal line. When one of the threads owned by that process is running, the process line is wide and dark green. Otherwise the process line is thin and black. (In case it’s not clear, only one thread ever runs at a time, so only one process ever runs at a time too.) So in the example above, threads owned by kernel (nk.exe) are running almost all the time. The vertical lines you see are due to threads in other processes running for very short periods of time. Remote Kernel Tracker draws vertical lines to represent switches between threads, to make it easier to see which threads ran before and after the thread switch.
Underneath each process are all the threads owned by that process. Processes can be expanded and collapsed to show and hide the threads, for easy viewing. There are no thread lines in Screen Shot 1, because none of the processes have been expanded to display their threads yet. Screen Shot 2 shows the same log, with some of the process lines expanded and with the key hidden for better viewing.
Screen Shot 2: Typical idle behavior, interrupts and some processes expanded.
In Screen Shot 2 you can see some thread lines too. Like the process lines, the thread lines are wide and light green whenever the threads are running, and thin when they are not. Most of the threads, except for “0x00000000,” run for such short times that at the zoom level shown in this screen shot, that you can’t see the horizontal thread-run lines, but you can tell that they ran because you can see the vertical thread-switch lines. On the threads you can also see some other icons representing the actions taken by the threads. More on these later.
You can also see that interrupts are represented as lines too, with one line for each SYSINTR value to show when interrupts occur.
What you’re seeing in these screen shots is typical idle activity on a device. The thread with handle “0x00000000” is not really a thread, but the placeholder to represent when the CPU is idle. Most of the other threads that run now and then are interrupt service threads (ISTs).
Every process will have one thread which has the same name as the process; that is the process’s primary thread, the one which starts running at main() or WinMain() or whatever flavor of entry point the process might use. All of the other threads in the process are given the name of the thread’s start routine, such as KITLInterruptThread under the process “NK.EXE” (the kernel).
An important detail to note at this point is that some of the features you can see here are due to the fact that you are running a profiling build. First, to minimize the impact of the logging hooks which record this data, interrupt data is only available on profiling builds. When the profiler is not present, the “Interrupts” line will not appear. Second, the thread names are also only available on profiling builds. Remote Kernel Tracker does not look up thread names from the symbol files in the release directory. Instead, on profiling builds the CeLog event logging hooks obtain thread names via the special symbol table that is included in ROM for use by the profiler, to represent all the code that is stored in ROM. The profiler uses that symbol table to look up symbols for profiling hits, and the logging hooks take advantage of the symbol table in order to provide thread names. Threads from processes and DLLs that are not stored in the “modules” section of ROM will not have names, since they will not be listed in the profiler’s symbol table. And on non-profiling builds, no threads will have names.
There is also a trick you can use to make the desktop tool “readlog.exe” look up symbols from the release directory and build a new log file containing the thread names on any build and regardless of where the thread’s code is running from. I don’t quite have documentation to point you to about that yet, but open up a release directory build window and type “readlog -?” on the command line to find out how to use readlog.
The other thread line I haven’t discussed yet is the Thread Migrate line. In Windows CE, when a thread calls a system API that is implemented by a different process, the thread moves from its owner process into the system process for the duration of the call. The Thread Migrate line in Remote Kernel Tracker represents the period of time during which the thread is executing inside a process other than its owner. You can hover over the line to learn which process the thread is executing inside.
In my opinion, this representation of thread migration is not terribly useful. It helps you see when a thread is running inside a process other than its owner, but it does not help you see the entire set of processes the thread transitions to. For example if your thread calls an API, and that API calls a different API, your thread may actually transition through multiple processes, but it is not easy to see if or when those transitions occur. Nor does it help you understand why those transitions took place (which APIs the thread was calling).
Back to the event types in the key. The main events I pay attention to when I’m looking at a log in Remote Kernel Tracker are a subset of those related to thread scheduling:
If you are familiar with the Win32 API set, you probably can guess what these mean. The critical section events represent when threads call the EnterCriticalSection and LeaveCriticalSection functions, and encounter contention over the critical section object. Critical sections are used so widely that logging every entry and exit would generate too many events and bog down system performance. So only those calls which block the current thread or unblock another thread are logged. The Set Event, Reset Event and Pulse Event events represent actions taken on Win32 “event” objects. Similarly the Release Semaphore and Release Mutex events represent actions on Win32 semaphore and mutex objects. The wait event represents a call to the Win32 WaitForSingleObject or WaitForMultipleObjects APIs, regardless of whether the thread blocks on those objects or not.
Finally, the Sleep Event event represents when the thread calls the Win32 Sleep API. In general I am not a big fan of Sleep, at least non-zero Sleep periods. Often it represents lazy programming, and results in polling that churns CPU time and drains power. For example, observe the behavior of the selected thread in Screen Shot 3 below. This thread is polling at 500ms intervals.
Screen Shot 3: Using Sleep to poll at regular intervals consumes CPU and drains power.
If you hunt around most logs, you’ll find cases where one thread will call the SetEvent API, then another thread will start running, do some work, and then block again in WaitForMultipleObjects on the same event handle.
Screen Shot 4: A waiting thread is awakened when another thread signals an object.
In Screen Shot 4 you can see that the threadsample.exe main thread set an event (the “flag” icon) that the last thread in the same process was waiting for. In this screen shot I double-clicked on the WaitForMultipleObjects event and the Set Event event to show their data. This often makes it easier to compare handle values between events. I know these two events are related because they both apply to handle value “0xA3E0FF96” in this case.
There are a few other events I also look at when I’m examining logs.
The interrupt event you’ve already seen above, representing when interrupt service routines (ISRs) execute. It’s worth mentioning that you can see relationships between interrupt service routines (ISRs) running – the “interrupt” events you see in Kernel Tracker – and their corresponding interrupt service threads (ISTs) waking up to do further interrupt handling. In Screen Shot 5 below, you can see the ISR for SYSINTR 1 (the timer interrupt, I believe?) run, and then two different ISTs follow up. You can also see the ISR for SYSINTR 30 (the KITL interrupt) run, and the KITLInterruptThread wake up to do some work. If you look back at Screen Shot 2 above, you can also see this relationship between the KITL interrupt and KITLInterruptThread when scaled back to a larger zoom level. Note that no IST runs for SYSINTR 0, which is a no-op.
Screen Shot 5: The relationship between ISRs and ISTs.
The Invert Priorities event is shown on a thread when the kernel temporarily changes the thread’s priority because it holds a resource that some other higher-priority thread needs. For example, if a low-priority thread holds a critical section that a high-priority thread tries to enter, the kernel temporarily boosts the low-priority thread to a high priority, to execute until it releases the critical section. Then the low-priority thread returns to its original priority. For real-time behavior, you never want real-time threads to block on resources owned by lower-priority threads, since those threads may do an undetermined amount of work before releasing the resources. Screen Shot 6 is an example of a case where a higher-priority thread ends up boosting the priority of another thread over and over again.
Screen Shot 6: Recurring cases of priority inversion.
The icons in Screen Shot 6 are a little hard to see – sorry – but what’s going on is that a high-priority IST in the kernel enters a critical section that is held every time by a low-priority thread from testdb.exe. The kernel thread logs a priority-inversion event representing the priority-boost given to the testdb.exe thread. Then the kernel thread blocks on the critical section. The now-high-priority testdb.exe thread runs until it leaves the critical section, then logs a priority-inversion event representing its return to its regular priority. The kernel thread then begins running again, now that it owns the critical section and is the highest-priority thread. You can see in this log that the inversion happens regularly in this particular case.
The page fault event represents a costly page fault operation occurring on a thread. In CE 4.2 and earlier, a page fault event was logged at the beginning of a page fault; in CE 5.0 and later, an event is also logged at the end of the page fault so that you can tell how long the page fault took. I don’t see this event often, but sometimes I see cases like this, where a thread incurs many page faults:
Screen Shot 7: A thread incurring many page faults. This log is from CE 5.0 so the fault markers are in begin/end pairs so that you can tell how long they took.
When I see many page faults like this, I run the readlog.exe command-line tool to get a summary of the faults, to determine where they are occurring. In general the important thing to check is whether the fault address is paged more than once. Code must be paged once in order to execute at all, but if it is paged more than once, that means the kernel’s page pool is not large enough to hold all the executing code. In that case the page pool may need to be enlarged. If data such as memory-mapped files is paged more than once, then the data may not have good page locality, or the system may be in a low-memory state.
You can also log your own events from your code. Since Remote Kernel Tracker does not have the ability to interpret your data blobs, you may find it useful to log your data using a set of event types that Remote Kernel Tracker does understand:
When Remote Kernel Tracker sees one of these event types, it expects the data logged for that event to be an array of the given type. For example, for the “Raw ulong” type Remote Kernel Tracker interprets the data as an array of unsigned long integers (DWORDs).
For the “Raw wchar” type the data is an array of Unicode characters, and Remote Kernel Tracker will display it as a string.
To see the data corresponding to one of these event types, hover over the event to see the tooltip, or double-click on the event to get the data in a separate window. Note also that the data is displayed in the bottom-left status bar when you click on an event.
Screen Shot 8: Viewing “raw” data in Remote Kernel Tracker.
I hope I have helped you understand what the tool is showing you, and see some of the things you can use CeLog and Remote Kernel Tracker for. This is a very versatile tool. It might take some practice to recognize these relationships between events, but a basic understanding of what thread behavior looks like in Kernel Tracker is the first thing you need when you’re troubleshooting a wide range of performance problems. Remote Kernel Tracker enables you to track down bugs related to thread interactions that are difficult to track with traditional development tools, including deadlocks, race conditions, excessive context switching or priority problems.
I have some more posts coming in the near future, about more scenarios you can recognize when you get used to Remote Kernel Tracker.